Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
technical:slurm:partitions [2018-10-26 10:41] – [Solution] frey | technical:slurm:partitions [2019-02-04 09:08] – frey | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Revisions to Slurm v1.0.0 | + | ====== Revisions to Slurm Configuration |
This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster. | This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster. | ||
Line 5: | Line 5: | ||
===== Issues ===== | ===== Issues ===== | ||
- | ==== Priority-access | + | ==== Priority-access |
The Slurm job scheduler handles the task of accepting computational work and meta data concerning the resources that work will require (a job); prioritizing a list of zero or more jobs that are awaiting execution; and allocating resources to pending jobs and starting their execution. | The Slurm job scheduler handles the task of accepting computational work and meta data concerning the resources that work will require (a job); prioritizing a list of zero or more jobs that are awaiting execution; and allocating resources to pending jobs and starting their execution. | ||
Line 19: | Line 19: | ||
When jobs are submitted to Slurm, zero or more partitions may be requested. | When jobs are submitted to Slurm, zero or more partitions may be requested. | ||
- | < | + | < |
Consider a workgroup who purchased one baseline node and one GPU node. Assume the state of the associated partitions is: | Consider a workgroup who purchased one baseline node and one GPU node. Assume the state of the associated partitions is: | ||
Line 36: | Line 36: | ||
Fine-grain control over resource limits on Slurm partitions must be implemented with a quality-of-service (QOS) definition. | Fine-grain control over resource limits on Slurm partitions must be implemented with a quality-of-service (QOS) definition. | ||
- | The current configuration requires that each workgroup receive a QOS containing their aggregate purchased-resource limits, and that QOS be allowed to augment the baseline QOS of each partition to which the workgroup has access. | + | The current configuration requires that each workgroup receive a QOS containing their aggregate purchased-resource limits, and that QOS be allowed to augment the baseline QOS of each partition to which the workgroup has access. |
- | < | + | < |
QOS is most often used to alter the scheduling behavior of a job, increasing or decreasing the baseline priority or run time limit, for example. | QOS is most often used to alter the scheduling behavior of a job, increasing or decreasing the baseline priority or run time limit, for example. | ||
Line 46: | Line 46: | ||
There are currently six hardware-specific partitions configured on Caviness. | There are currently six hardware-specific partitions configured on Caviness. | ||
- | < | + | < |
Following our recommendation to purchase shares annually, a workgroup could easily end up with access to many hardware-specific partitions and no means to effectively use at priority all the resources purchased. | Following our recommendation to purchase shares annually, a workgroup could easily end up with access to many hardware-specific partitions and no means to effectively use at priority all the resources purchased. | ||
Line 57: | Line 57: | ||
When GPUs were introduced in Farber, some workgroups desired that GPU-bound jobs requiring only a single controlling CPU core be scheduled as such, leaving the other cores on that CPU available for non-GPU workloads. | When GPUs were introduced in Farber, some workgroups desired that GPU-bound jobs requiring only a single controlling CPU core be scheduled as such, leaving the other cores on that CPU available for non-GPU workloads. | ||
- | < | + | < |
</ | </ | ||
Line 67: | Line 67: | ||
The use of workgroup partitions, akin to the owner queues on Mills and Farber, suggests itself as a viable solution. | The use of workgroup partitions, akin to the owner queues on Mills and Farber, suggests itself as a viable solution. | ||
- | < | + | < |
In the spirit of the spillover queues on Mills and Farber, the workgroup has priority access to the kinds of nodes they purchased, not just specific nodes in the cluster. | In the spirit of the spillover queues on Mills and Farber, the workgroup has priority access to the kinds of nodes they purchased, not just specific nodes in the cluster. | ||
- | < | + | < |
This not only provides the necessary resource quota on the partition, but leaves the override QOS available for other purposes (as discussed in Problem 2). Since QOS resource limits are aggregate across all partitions using that QOS: | This not only provides the necessary resource quota on the partition, but leaves the override QOS available for other purposes (as discussed in Problem 2). Since QOS resource limits are aggregate across all partitions using that QOS: | ||
- | < | + | < |
This solution would not address the addition of partitions over time: | This solution would not address the addition of partitions over time: | ||
- | < | + | < |
Existing workgroups who augment their purchase would have their existing partition altered accordingly. | Existing workgroups who augment their purchase would have their existing partition altered accordingly. | ||
- | < | + | < |
Likewise, Problem 4 is addressed: | Likewise, Problem 4 is addressed: | ||
- | < | + | < |
Priority-access to GPU nodes would no longer be allocated by socket. | Priority-access to GPU nodes would no longer be allocated by socket. | ||
Line 99: | Line 99: | ||
The existing workgroup QOS definitions need no modifications. | The existing workgroup QOS definitions need no modifications. | ||
- | < | + | < |
PartitionName=< | PartitionName=< | ||
Nodes=< | Nodes=< | ||
Line 105: | Line 105: | ||
</ | </ | ||
- | ===== Job Submission Plugin ===== | + | The following Bash script was used to convert the hardware-specific partitions and their AllowedQOS levels to workgroup partitions: |
- | The job submission plugin has been modified to remove the forced assignment of "--qos=< | + | <file bash convert-hw-parts.sh> |
+ | # | ||
+ | |||
+ | WORKGROUPS=" | ||
+ | |||
+ | for WORKGROUP in ${WORKGROUPS}; | ||
+ | WORKGROUP_NODELIST=" | ||
+ | grep $WORKGROUP partitions.conf | awk ' | ||
+ | BEGIN { | ||
+ | nodelist=""; | ||
+ | } | ||
+ | / | ||
+ | for ( i=1; i <= NF; i++ ) { | ||
+ | if ( match($i, " | ||
+ | if ( nodelist ) { | ||
+ | nodelist = nodelist "," | ||
+ | } else { | ||
+ | nodelist = pieces[1]; | ||
+ | } | ||
+ | break; | ||
+ | } | ||
+ | } | ||
+ | } | ||
+ | END { | ||
+ | printf(" | ||
+ | } | ||
+ | ' | snodelist --nodelist=- --unique --compress | ||
+ | )" | ||
+ | if [ -n " | ||
+ | cat << | ||
+ | # | ||
+ | # ${WORKGROUP} (gid $(getent group ${WORKGROUP} | awk -F: ' | ||
+ | # | ||
+ | PartitionName=${WORKGROUP} Default=NO PriorityTier=10 Nodes=${WORKGROUP_NODELIST} MaxTime=7-00: | ||
+ | |||
+ | EOT | ||
+ | fi | ||
+ | done | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Job Submission Plugin ==== | ||
+ | |||
+ | The job submission plugin has been modified to remove the forced assignment of '' | ||
A second flag was added to include/ | A second flag was added to include/ | ||
Line 127: | Line 170: | ||
All changes have been implemented and are visible in the [[https:// | All changes have been implemented and are visible in the [[https:// | ||
+ | ==== Job Script Templates ==== | ||
+ | |||
+ | The Slurm job script templates available under ''/ | ||
===== Impact ===== | ===== Impact ===== | ||
At this time, the hardware-specific partitions are seeing relatively little use on Caviness. | At this time, the hardware-specific partitions are seeing relatively little use on Caviness. | ||
- | New partitions are added to the Slurm configuration (a text file) and distributed to all participating controller and compute nodes. | + | New partitions are added to the Slurm configuration (a text file) and distributed to all participating controller and compute nodes. |
<code bash> | <code bash> | ||
Line 143: | Line 189: | ||
===== Timeline ===== | ===== Timeline ===== | ||
- | {{:technical:slurm:caviness__revisions_to_slurm_v1.0.0_configuration.png? | + | ^Date ^Time ^Goal/ |
+ | |2018-10-19| |Limitations of hardware-specific partitions discussed| | ||
+ | |2018-10-24| |Project planning to //replace hardware-specific partitions// | ||
+ | |2018-10-25| |Modifications to job submission plugin completed| | ||
+ | | | |Altered plugin tested and debugged on //Venus// cluster| | ||
+ | | | |Project documentation added to HPC wiki| | ||
+ | |2018-10-26| |Workgroup partition configurations generated and staged for enablement| | ||
+ | | | |Announcement added to login nodes' SSH banner directing users to project documentation| | ||
+ | | | |Job script templates updated and staged for deployment| | ||
+ | |2018-10-29|09:00|Workgroup partitions enabled| | ||
+ | | |09:00|Modified job submission plugin enabled| | ||
+ | | |09:00|Modified job script templates available| | ||
+ | |2018-11-05|09: | ||
+ | | |09: | ||
+ |