Both sides previous revision Previous revision Next revision | Previous revision |
abstract:caviness:runjobs:schedule_jobs [2024-01-24 15:56] – [GPU nodes] anita | abstract:caviness:runjobs:schedule_jobs [2024-01-30 17:18] (current) – [GPU nodes] anita |
---|
==== GPU nodes ==== | ==== GPU nodes ==== |
| |
After entering into the workgroup, GPU nodes can be requested through an interactive session using ''salloc'' or through batch submission using ''sbatch''. An appropriate partition name (such as a workgroup for running or ''devel'' if you need to compile on a GPU node) has to be mentioned while running the command as below. | After entering into the workgroup, GPU nodes can be requested through an interactive session using ''salloc'' or through batch submission using ''sbatch''. An appropriate partition name (such as a workgroup for running or ''devel'' if you need to compile on a GPU node) and a GPU resource and type **must** be specified while running the command as below. |
| |
<code bash> | <code bash> |
</code> | </code> |
| |
Also if your workgroup has purchased more than one kind of GPU node and you want to target a specific GPU node type, then you can use ''%%--%%gres=gpu:p100'' or ''%%--%%gres=gpu:v100'' or ''%%--%%gres=gpu:t4''. See [[abstract:caviness:runjobs:job_status#sworkgroup|sworkgroup]] to determine your workgroup resources including GPU node type. In the example below, this particular workgroup has (2) ''gpu:p100'' and (2) ''gpu:v100'' types of GPUs available | Also if your workgroup has purchased more than one kind of GPU node, then you need to choose that specific GPU type to target it, such as ''%%--%%gres=gpu:p100'' or ''%%--%%gres=gpu:v100'' or ''%%--%%gres=gpu:t4'' or ''%%--%%gres=gpu:a100'' to by default get 1 GPU or the form ''%%--%%gres=gpu:<<GPU type>>:<<#>''. See [[abstract:caviness:runjobs:job_status#sworkgroup|sworkgroup]] to determine your workgroup resources including GPU node type. In the example below, this particular workgroup has (2) ''gpu:p100'', (2) ''gpu:v100'' and (2) ''gpu:a100'' types of GPUs available |
| |
<code bash> | <code bash> |
[traine@login00 ~]$ sworkgroup -g ececis_research --limits | [traine@login00 ~]$ sworkgroup -g ececis_research --limits |
Partition Per user Per job Per workgroup | Partition Per user Per job Per workgroup |
---------------+--------+-------+------------------------------------------------- | ---------------+--------+-------+----------------------------------------------------------------- |
devel 2 jobs cpu=4 | devel 3 jobs cpu=4 |
ececis_research cpu=152,mem=1882G,gres/gpu:p100=2,gres/gpu:v100=2 | ececis_research cpu=248,mem=3075G,gres/gpu:p100=2,gres/gpu:v100=2,gres/gpu:a100=2 |
reserved | reserved |
standard cpu=720 cpu=360 | standard cpu=720 cpu=360 |
</code> | </code> |
| |
Any user can employ a GPU by running in the ''standard'' partition, however keep in mind jobs can be preempted and would require [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpointing]] as part of your batch job script. The interactive session example below requests any node with a GPU v100, 1 core and 1 GB of memory (default values if not specified) on the standard partition. | Any user can employ a GPU by running in the ''standard'' partition, however keep in mind a GPU type **must** be specified, jobs can be preempted and would require [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpointing]] as part of your batch job script. The interactive session example below requests any node with (2) GPUs v100 type, 1 core, 1 GB of memory and 30 minutes of time (default values if not specified) on the ''standard'' partition. |
| |
<code bash> | <code bash> |
salloc --partition=standard --gres=gpu:v100 | salloc --partition=standard --gres=gpu:v100:2 |
</code> | </code> |
| |
to allocate any type of GPU node available for your interactive job in the standard partition. | If you are unsure of the GPU types and counts available in the ''standard'' partition, see [[abstract:caviness:caviness#compute-nodes|Compute Nodes]] on Caviness. |
==== Enhanced Local Scratch nodes ==== | ==== Enhanced Local Scratch nodes ==== |
| |