Revision to Slurm job submission to require GPU types
This document summarizes an alteration to the job submission plugin to prevent the non-specific request of GPU resources.
Issues
Workgroup resource limits are effected through a Slurm Quality of Service (QOS) record. For example:
[user@login00.caviness ~]$ sacctmgr show qos -np workgroup_X | cut -d\| -f9 cpu=248,mem=3075G,gres/gpu:p100=2,gres/gpu:v100=2,gres/gpu:a100=2
This workgroup purchased GPU nodes in all generations of Caviness:
Generation | GPU type | Count |
---|---|---|
1 | P100 | 2 |
2 | V100 | 2 |
3 | A100 | 2 |
Generically speaking, workgroup_X has access to 6 GPU devices: Slurm implicitly includes a generic resource limit for workgroup_X that would be displayed as gres/gpu=6
.
When a job is submitted with the flag --gres=gpu:a100:1
Slurm generates the following resource debits for that job:
[user@login00.caviness ~]$ scontrol show job 9897654321 | grep 'TRES=' TRES=cpu=24,mem=500G,node=1,gres/gpu=1,gres/gpu:a100=1
The type-specific (A100) resource limit is affected, as is the generic implicit limit. Consider, though, a job that is submitted with the flag --gres=gpu
:
[user@login00.caviness ~]$ scontrol show job 123456789 | grep 'TRES=' TRES=cpu=5,mem=700G,node=1,billing=716805,gres/gpu=1
In this case, only the generic implicit limit is affected. Job 123456789 can use any type of GPU to which workgroup_X has access, but no type-specific limit will influence the scheduler's choice of GPU type. Even if workgroup_X already has running jobs using 2 of 2 A100 GPUs, job 123456789 would be allowed to use a third A100 GPU — effectively borrowing an A100 against the quota of P100 and V100 GPUs they also purchased. This is not the intended behavior.
--gres=gpu
or --gres=gpu:«#»
must be altered to include the desired GPU type: for example, --gres=gpu:a100
or --gres=gpu:a100:«#»
. The command sworkgroup -g «workgroup» --limits
displays the GPU types and counts available to your workgroup for jobs in the workgroup partition. Jobs submitted to the standard partition (for which workgroup GPU limits do not apply) must also specify the GPU type once the change goes into effect. If you are unsure of the GPU types and counts available in the standard
partition, see Compute Nodes on Caviness.
Generational Change
When Caviness was first built and the job submission plugin written, the cluster only contained P100 GPUs. In all cases where the generic gres/gpu
equates with a single type-specific resource (like gres/gpu:p100
originally) jobs that omit the GPU type at submission still debit the resource limit appropriately.
In Generation 2 of Caviness the V100 GPU had become the model of choice from NVIDIA and the Slurm configuration added the v100
GPU type in addition to the p100
type. At that time the issue with regard to non-specific GPU requests was not anticipated or noted.
Generation 3 added a100
and a40
GPU types.
Implementation
The job submission plugin for Caviness' Slurm job scheduler must be altered to detect the omission of GPU type and raise an error that prevents the job from being accepted for scheduling. Users attempting to submit a job using non-specific GPU syntax:
[user@login00.caviness ~]$ sbatch --gres gpu:2 --partition workgroup_X …
will receive the error message
No GPU type requested: gpu:2
In this case, the user must choose the specific type of GPU the job requires:
[user@login00.caviness ~]$ sbatch --gres gpu:a100:2 --partition workgroup_X …
If you are unsure of the GPU types and counts available in your workgroup partition, use the command sworkgroup -g workgroup_X --limits
. Remember jobs submitted to the standard
partition (for which workgroup GPU limits do not apply) must also specify the GPU type. If you are unsure of the GPU types and counts available in the standard
partition, see Compute Nodes on Caviness.
Impact
The Slurm scheduler will be restarted to load the updated job submission plugin. Job submission and query (via sbatch
, sacct
, squeue
for example) will hang for a period anticipated to be less than one minute.
Timeline
Date | Time | Goal/Description |
---|---|---|
2024-01-18 | Authoring of this document | |
2024-01-18 | Alteration of job submission plugin | |
2024-02-01 | 10:00 | Implementation |