abstract:darwin:runjobs:queues

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
abstract:darwin:runjobs:queues [2021-04-26 11:59] – [Defaults and limits for all partitions] anitaabstract:darwin:runjobs:queues [2023-07-10 08:38] frey
Line 1: Line 1:
  
-====== The job queues (partitions) on DARWIN ====== 
- 
-The DARWIN cluster has several partitions (queues) available to specify when running jobs.  These partitions correspond to the various node types available in the cluster: 
- 
-^Kind^Description^ 
-|standard|Contains all 48 standard memory nodes (64 cores, 512 GiB memory per node)| 
-|large-mem|Contains all 32 large memory nodes (64 cores, 1024 GiB memory per node)| 
-|xlarge-mem|Contains all 11 extra-large memory nodes (64 cores, 2048 GiB memory per node)| 
-|extended-mem|Contains the single extended memory node (64 cores, 1024 GiB memory + 2.73 TiB NVMe swap)| 
-|gpu-t4|Contains all 9 NVIDIA Tesla T4 GPU nodes (64 cores, 512 GiB memory, 1 T4 GPU per node)| 
-|gpu-v100|Contains all 3 NVIDIA Tesla V100 GPU nodes (48 cores, 768 GiB memory, 4 V100 GPUs per node)| 
-|gpu-mi50|Contains the single AMD Radeon Instinct MI50 GPU node (64 cores, 512 GiB memory, 1 MI50 GPU)| 
- 
-===== Requirements for all partitions ===== 
- 
-All partitions on DARWIN have two requirements for submitting jobs: 
-  - You must set your workgroup prior to submitting a job by using the **workgroup** command (e.g., ''workgroup -g it_nss'').  This ensures jobs are billed against the correct account in Slurm. 
-  - You must explicitly request a single partition in your job submission using ''%%--%%partition'' or ''%%-%%p''. 
- 
-===== Defaults and limits for all partitions ===== 
- 
-All partitions on DARWIN have the following defaults: 
-  * Default run time of 30 minutes 
-  * Default resources of 1 node, 1 CPU, and 1 GiB memory 
-  * Default **no** preemption 
- 
-All partitions on DARWIN have the following limits: 
-  * Maximum run time of 2 days 
-  * Maximum of 400 jobs per user per partition 
- 
-===== The extended-mem partition ===== 
- 
-Because access to the swap cannot be limited via Slurm, the extended-mem partition is configured to run all jobs in exclusive user mode.  This means only a single user can be on the node at a time, but that user can run one or more jobs on the node.  All jobs on the node will have access to the full amount of swap available, so care must be taken in usage of swap when running multiple jobs. 
- 
-===== The GPU partitions ===== 
- 
-Jobs that will run in one of the GPU partitions can request GPU resources using ONE of the following flags: 
- 
-^Flag^Description^ 
-|''%%--%%gpus=<count>''|<count> GPUs total for the job, regardless of node count| 
-|''%%--%%gpus-per-node=<count>''|<count> GPUs are required on each node allocated to the job| 
-|''%%--%%gpus-per-socket=<count>''|<count> GPUs are required on each socket allocated to the job| 
-|''%%--%%gpus-per-task=<count>''|<count> GPUs are required for each task in the job| 
- 
-If you do not specify one of these flags, your job will not be allocated any GPUs. 
- 
-**PLEASE NOTE:**  On DARWIN the ''%%--%%gres'' flag should NOT be used to request GPU resources.  The GPU type will be inferred from the partition to which the job is submitted if not specified. 
  • abstract/darwin/runjobs/queues.txt
  • Last modified: 2023-07-10 08:51
  • by frey