Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
abstract:darwin:runjobs:queues [2021-08-26 18:05] – pdw | abstract:darwin:runjobs:queues [2023-07-10 08:38] – frey | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== The job queues (partitions) on DARWIN ====== | ||
- | |||
- | The DARWIN cluster has several partitions (queues) available to specify when running jobs. These partitions correspond to the various node types available in the cluster: | ||
- | |||
- | ^Partition Name^Description^ | ||
- | |standard|Contains all 48 standard memory nodes (64 cores, 512 GiB memory per node)| | ||
- | |large-mem|Contains all 32 large memory nodes (64 cores, 1024 GiB memory per node)| | ||
- | |xlarge-mem|Contains all 11 extra-large memory nodes (64 cores, 2048 GiB memory per node)| | ||
- | |extended-mem|Contains the single extended memory node (64 cores, 1024 GiB memory + 2.73 TiB NVMe swap)| | ||
- | |gpu-t4|Contains all 9 NVIDIA Tesla T4 GPU nodes (64 cores, 512 GiB memory, 1 T4 GPU per node)| | ||
- | |gpu-v100|Contains all 3 NVIDIA Tesla V100 GPU nodes (48 cores, 768 GiB memory, 4 V100 GPUs per node)| | ||
- | |gpu-mi50|Contains the single AMD Radeon Instinct MI50 GPU node (64 cores, 512 GiB memory, 1 MI50 GPU)| | ||
- | |idle|Contains all nodes in the cluster, jobs on this partition can be preempted but are not charged against your allocation| | ||
- | |||
- | ===== Requirements for all partitions ===== | ||
- | |||
- | All partitions on DARWIN have two requirements for submitting jobs: | ||
- | - You must set an allocation workgroup prior to submitting a job by using the **workgroup** command (e.g., '' | ||
- | - You must explicitly request a single partition in your job submission using '' | ||
- | |||
- | ===== Defaults and limits for all partitions ===== | ||
- | |||
- | All partitions on DARWIN except '' | ||
- | * Default run time of 30 minutes | ||
- | * Default resources of 1 node, 1 CPU, and 1 GiB memory | ||
- | * Default **no** preemption | ||
- | |||
- | All partitions on DARWIN except '' | ||
- | * Maximum run time of 7 days | ||
- | * Maximum of 400 jobs per user per partition | ||
- | |||
- | The '' | ||
- | * **Preemption is enabled for all jobs** | ||
- | * Maximum of 320 jobs per user | ||
- | * Maximum of 640 CPUs per user (across all jobs in the partition) | ||
- | |||
- | ===== The extended-mem partition ===== | ||
- | |||
- | Because access to the swap cannot be limited via Slurm, the '' | ||
- | |||
- | ===== The GPU partitions ===== | ||
- | |||
- | Jobs that will run in one of the GPU partitions must request GPU resources using ONE of the following flags: | ||
- | |||
- | ^Flag^Description^ | ||
- | |'' | ||
- | |'' | ||
- | |'' | ||
- | |'' | ||
- | |||
- | If you do not specify one of these flags, your job will not be permitted to run in the GPU partitions. | ||
- | |||
- | <note warning> | ||
- | |||
- | ===== The idle partition ===== | ||
- | |||
- | The '' | ||
- | |||
- | <note warning> | ||
- | |||
- | Jobs in the '' | ||
- | |||
- | Jobs that execute in the '' |