Revision to Slurm job submission to require GPU types

This document summarizes an alteration to the job submission plugin to prevent the non-specific request of GPU resources.

Workgroup resource limits are effected through a Slurm Quality of Service (QOS) record. For example:

[user@login00.caviness ~]$ sacctmgr show qos -np workgroup_X | cut -d\| -f9
cpu=248,mem=3075G,gres/gpu:p100=2,gres/gpu:v100=2,gres/gpu:a100=2

This workgroup purchased GPU nodes in all generations of Caviness:

Generation	GPU type	Count
1	P100	2
2	V100	2
3	A100	2

Generically speaking, workgroup_X has access to 6 GPU devices: Slurm implicitly includes a generic resource limit for workgroup_X that would be displayed as gres/gpu=6.

When a job is submitted with the flag --gres=gpu:a100:1 Slurm generates the following resource debits for that job:

[user@login00.caviness ~]$ scontrol show job 9897654321 | grep 'TRES='
   TRES=cpu=24,mem=500G,node=1,gres/gpu=1,gres/gpu:a100=1

The type-specific (A100) resource limit is affected, as is the generic implicit limit. Consider, though, a job that is submitted with the flag --gres=gpu:

[user@login00.caviness ~]$ scontrol show job 123456789 | grep 'TRES='
   TRES=cpu=5,mem=700G,node=1,billing=716805,gres/gpu=1

In this case, only the generic implicit limit is affected. Job 123456789 can use any type of GPU to which workgroup_X has access, but no type-specific limit will influence the scheduler's choice of GPU type. Even if workgroup_X already has running jobs using 2 of 2 A100 GPUs, job 123456789 would be allowed to use a third A100 GPU — effectively borrowing an A100 against the quota of P100 and V100 GPUs they also purchased. This is not the intended behavior.

[VERY IMPORTANT] Once this change goes into effect all job scripts and command line requests for GPUs using the syntax --gres=gpu or --gres=gpu:«#» must be altered to include the desired GPU type: for example, --gres=gpu:a100 or --gres=gpu:a100:«#». The command sworkgroup -g «workgroup» --limits displays the GPU types and counts available to your workgroup for jobs in the workgroup partition. Jobs submitted to the standard partition (for which workgroup GPU limits do not apply) must also specify the GPU type once the change goes into effect. If you are unsure of the GPU types and counts available in the standard partition, see Compute Nodes on Caviness.

When Caviness was first built and the job submission plugin written, the cluster only contained P100 GPUs. In all cases where the generic gres/gpu equates with a single type-specific resource (like gres/gpu:p100 originally) jobs that omit the GPU type at submission still debit the resource limit appropriately.

In Generation 2 of Caviness the V100 GPU had become the model of choice from NVIDIA and the Slurm configuration added the v100 GPU type in addition to the p100 type. At that time the issue with regard to non-specific GPU requests was not anticipated or noted.

Generation 3 added a100 and a40 GPU types.

The job submission plugin for Caviness' Slurm job scheduler must be altered to detect the omission of GPU type and raise an error that prevents the job from being accepted for scheduling. Users attempting to submit a job using non-specific GPU syntax:

[user@login00.caviness ~]$ sbatch --gres gpu:2 --partition workgroup_X …

will receive the error message

No GPU type requested: gpu:2

In this case, the user must choose the specific type of GPU the job requires:

[user@login00.caviness ~]$ sbatch --gres gpu:a100:2 --partition workgroup_X …

If you are unsure of the GPU types and counts available in your workgroup partition, use the command sworkgroup -g workgroup_X --limits. Remember jobs submitted to the standard partition (for which workgroup GPU limits do not apply) must also specify the GPU type. If you are unsure of the GPU types and counts available in the standard partition, see Compute Nodes on Caviness.

The Slurm scheduler will be restarted to load the updated job submission plugin. Job submission and query (via sbatch, sacct, squeue for example) will hang for a period anticipated to be less than one minute.

Date	Time	Goal/Description
2024-01-18		Authoring of this document
2024-01-18		Alteration of job submission plugin
2024-02-01	10:00	Implementation

Revision to Slurm job submission to require GPU types

Issues

Generational Change

Implementation

Impact

Timeline

hpc documentation