Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
technical:slurm:scheduler-params [2019-10-30 09:55] – [Altered priority weights] frey | technical:slurm:scheduler-params [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Revisions to Slurm Configuration v2.0.0 on Caviness ====== | ||
- | This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster. | ||
- | |||
- | ===== Issues ===== | ||
- | |||
- | ==== Large queue sizes ==== | ||
- | |||
- | There are currently no limits on the number of jobs each user can submit on Caviness. | ||
- | |||
- | * Calculation of owning user's fair-share priority (based on decaying usage history) | ||
- | * Calculation of overall job priority (fair-share, | ||
- | * Sorting of all jobs in the queue based on priority | ||
- | * From the head of the queue up: | ||
- | * Search for free resources matching requested resources | ||
- | * Start execution if the job is eligible and resources are free | ||
- | |||
- | The fair-share calculations require extensive queries against the job database, and locating free resources is a complex operation. | ||
- | |||
- | Many Caviness users are used to submitting a job and immediately seeing (via '' | ||
- | |||
- | One reason the Slurm queue on Caviness can see degraded scheduling efficiency when filled with too many jobs relates to the ordering of the jobs -- and thus to the job priority. | ||
- | |||
- | ^factor^ multiplier^notes^ | ||
- | |qos override (priority-access)| 20000|standard, | ||
- | |wait time (age)| 8000|longest wait time in queue=1.0| | ||
- | |fair-share| 4000|see '' | ||
- | |partition id| 2000|1.0 for all partitions| | ||
- | |job resource size| 1|largest resource request=1.0| | ||
- | |||
- | Next to priority access, wait time is the largest factor: | ||
- | |||
- | Taken together, these factors allow a single user to submit thousands of jobs (even if s/he has a very small share of purchased cluster resources) that quickly sort to the head of the pending queue due to their wait time. The weight on wait time then begins to prioritize those jobs over jobs submitted by users who have not been using the cluster, contrary to the goals of fair-share. | ||
- | |||
- | ===== Solutions ===== | ||
- | |||
- | ==== Job submission limits ==== | ||
- | |||
- | On many HPC systems per-user limits are enacted to restrict how many pending jobs can be present in the queue: | ||
- | |||
- | It would be preferable to avoid enacting such limits on Caviness. | ||
- | |||
- | ==== Altered priority weights ==== | ||
- | |||
- | The dominance of wait time in priority calculations is probably the factor contributing most greatly to this problem. | ||
- | |||
- | The Slurm documentation also points out that the priority factor weights should be of a magnitude that allows enough significant digits from each factor (minimum 1000 for important factors). | ||
- | |||
- | - Partitions all contribute the same value, therefore the weight can be 0 | ||
- | - Priority-access should unequivocally bias the priority higher; as a binary (0.0 or 1.0) very few bits should be necessary | ||
- | - Fair-share should outweigh the remaining factors in importance | ||
- | - Wait time and job size should be considered equivalent (or nearly so with wait time greater than job size) | ||
- | * The job size is determined by the **PriorityWeightTRES** option; currently set to the default, which is empty, which yields 0.0 for every job(!) | ||
- | |||
- | It seems appropriate to split the 32-bit value into groups that represent each priority-weighting tier: | ||
- | |||
- | ^mask^tier^ | ||
- | |3 << 30 = '' | ||
- | |262143 << 12 = '' | ||
- | |4095 = '' | ||
- | |||
- | The wait time and job size group of bits is split 60% to wait time, 40% to job size: | ||
- | |||
- | ^mask^sub-factor^ | ||
- | |2457 = '' | ||
- | |1638 = '' | ||
- | |||
- | The **PriorityWeightTRES** must be set, as well, to yield a non-zero contribution for the job size factor; internally, the weights in **PriorityWeightTRES** are converted to double-precision floating point values. | ||
- | |||
- | <note important> | ||
- | |||
- | |||
- | ===== Implementation ===== | ||
- | |||
- | ===== Impact ===== | ||
- | |||
- | No downtime is expected to be required. | ||
- | |||
- | ===== Timeline ===== | ||
- | |||
- | ^Date ^Time ^Goal/ | ||
- | |2019-10-24| |Authoring of this document| |