Differences

This shows you the differences between two versions of the page.

--- abstract:caviness:runjobs:schedule_jobs [2024-01-24 15:56] – [GPU nodes] anita
+++ abstract:caviness:runjobs:schedule_jobs [2024-01-30 17:18] (current) – [GPU nodes] anita
@@ Line 288: / Line 288: @@
 ==== GPU nodes ====
-After entering into the workgroup, GPU nodes can be requested through an interactive session using ''salloc'' or through batch submission using ''sbatch''. An appropriate partition name (such as a workgroup for running or ''devel'' if you need to compile on a GPU node) has to be mentioned while running the command as below.
+After entering into the workgroup, GPU nodes can be requested through an interactive session using ''salloc'' or through batch submission using ''sbatch''. An appropriate partition name (such as a workgroup for running or ''devel'' if you need to compile on a GPU node) and a GPU resource and type **must** be specified while running the command as below.
 <code bash>
@@ Line 315: / Line 315: @@
 </code>
-Also if your workgroup has purchased more than one kind of GPU node and you want to target a specific GPU node type, then you can use ''%%--%%gres=gpu:p100'' or ''%%--%%gres=gpu:v100'' or ''%%--%%gres=gpu:t4''.  See [[abstract:caviness:runjobs:job_status#sworkgroup|sworkgroup]] to determine your workgroup resources including GPU node type. In the example below, this particular workgroup has (2) ''gpu:p100'' and (2) ''gpu:v100'' types of GPUs available
+Also if your workgroup has purchased more than one kind of GPU node, then you need to choose that specific GPU type to target it, such as ''%%--%%gres=gpu:p100'' or ''%%--%%gres=gpu:v100'' or ''%%--%%gres=gpu:t4'' or ''%%--%%gres=gpu:a100'' to by default get 1 GPU or the form ''%%--%%gres=gpu:<<GPU type>>:<<#>''.  See [[abstract:caviness:runjobs:job_status#sworkgroup|sworkgroup]] to determine your workgroup resources including GPU node type. In the example below, this particular workgroup has (2) ''gpu:p100'', (2) ''gpu:v100'' and (2) ''gpu:a100'' types of GPUs available
 <code bash>
 [traine@login00 ~]$ sworkgroup -g ececis_research --limits
 Partition       Per user Per job Per workgroup
----------------+--------+-------+-------------------------------------------------
+---------------+--------+-------+-----------------------------------------------------------------
-devel           2 jobs   cpu=4
+devel           3 jobs   cpu=4
-ececis_research                  cpu=152,mem=1882G,gres/gpu:p100=2,gres/gpu:v100=2
+ececis_research                  cpu=248,mem=3075G,gres/gpu:p100=2,gres/gpu:v100=2,gres/gpu:a100=2
 reserved
 standard        cpu=720  cpu=360
 </code>
-Any user can employ a GPU by running in the ''standard'' partition, however keep in mind jobs can be preempted and would require [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpointing]] as part of your batch job script.  The interactive session example below requests any node with a GPU v100, 1 core and 1 GB of memory (default values if not specified) on the standard partition.
+Any user can employ a GPU by running in the ''standard'' partition, however keep in mind a GPU type **must** be specified, jobs can be preempted and would require [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpointing]] as part of your batch job script.  The interactive session example below requests any node with (2) GPUs v100 type, 1 core, 1 GB of memory and 30 minutes of time (default values if not specified) on the ''standard'' partition.
 <code bash>
-salloc --partition=standard --gres=gpu:v100
+salloc --partition=standard --gres=gpu:v100:2
 </code>
-to allocate any type of GPU node available for your interactive job in the standard partition.
+If you are unsure of the GPU types and counts available in the ''standard'' partition, see [[abstract:caviness:caviness#compute-nodes|Compute Nodes]] on Caviness.
 ==== Enhanced Local Scratch nodes ====