Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
abstract:caviness:runjobs:queues [2023-05-30 13:48] – [The job queues (partitions) on Caviness] anita | abstract:caviness:runjobs:queues [2023-05-30 13:48] (current) – [The job queues (partitions) on Caviness] anita | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | |||
+ | ====== The job queues (partitions) on Caviness ====== | ||
+ | |||
+ | The Caviness cluster has several kinds of partition (queue) available in which to run jobs: | ||
+ | |||
+ | ^Kind^Description^Nodes^ | ||
+ | |standard|The default partition if no '' | ||
+ | |devel|A partition with very short runtime limits and small resource limits; important to use for any development using compilers|'' | ||
+ | |workgroup-specific|Partitions associated with specific kinds of compute equipment in the cluster purchased by a research group <<// | ||
+ | |||
+ | ===== The standard partition ===== | ||
+ | |||
+ | This partition is the default when no '' | ||
+ | |||
+ | The idea of the standard partition is somewhat like the combination of the standby and spillover queues concepts in the earlier clusters. | ||
+ | |||
+ | Limits to jobs submitted to this partition are: | ||
+ | * a maximum runtime of 7 days (default is 30 minutes) | ||
+ | * Maximum number of CPUs per job = 360 | ||
+ | * Maximum CPUs per user = 720 | ||
+ | |||
+ | The standard partition is subject to job preemption (killed) because it allows a job submitted to a workgroup-specific partition to release resources tied-up by jobs in the standard partition. In summary, jobs in the standard partition will be preempted (killed with 5 minute grace period) to release resources for the workgroup-specific partition job. For more information on how to handle your job if it is preempted, please refer to [[abstract: | ||
+ | |||
+ | ===== The devel partition ===== | ||
+ | |||
+ | This partition is used for short-lived jobs with minimal resource needs. | ||
+ | * Performing compiles of code for projects that otherwise can't be done on the login (head) node and to make sure you are allocated a compute node with the development tools, libraries, etc. which are needed for compilers. | ||
+ | * Running test jobs to vet programs or changes to programs | ||
+ | * Testing correctness of program parallelization | ||
+ | * Interactive sessions | ||
+ | * Removing files especially if cleaning up many files and directories in '' | ||
+ | Because performance is not critical for these use cases, the nodes serviced by the '' | ||
+ | |||
+ | Limits to jobs submitted to this partition are: | ||
+ | * a maximum runtime of 2 hours (default is 30 minutes) | ||
+ | * each user can submit up to 2 jobs | ||
+ | * each job can use up to 4 cores on a single node | ||
+ | |||
+ | For example: | ||
+ | <code bash> | ||
+ | [traine@login01 ~]$ workgroup -g it_css | ||
+ | [(it_css: | ||
+ | Mon Jul 23 15:25:07 EDT 2018 | ||
+ | </ | ||
+ | |||
+ | One copy of the '' | ||
+ | <code bash> | ||
+ | [traine@login01 ~]$ workgroup -g it_css | ||
+ | [(it_css: | ||
+ | salloc: Granted job allocation 940 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n56 are ready for job | ||
+ | [traine@r00n56 ~]$ echo $SLURM_CPUS_ON_NODE | ||
+ | 2 | ||
+ | </ | ||
+ | |||
+ | ===== The workgroup-specific partitions ===== | ||
+ | |||
+ | The use of // | ||
+ | |||
+ | Limits to jobs submitted to workgroup-specific partitions: | ||
+ | * a maximum runtime of 7 days (default is 30 minutes) | ||
+ | * per-workgroup resource limits (QOS) based on | ||
+ | * how many nodes your research group (workgroup) purchased (node=#) | ||
+ | * how many cores your research group (workgroup) purchased (cpu=#) | ||
+ | * how many GPUs your research group (workgroup) purchased (gres/ | ||
+ | |||
+ | For example: | ||
+ | |||
+ | <code bash> | ||
+ | $ workgroup -g it_nss | ||
+ | $ sbatch --verbose --partition=_workgroup_ … | ||
+ | : | ||
+ | sbatch: partition | ||
+ | : | ||
+ | Submitted batch job 1234 | ||
+ | $ scontrol show job 1234 | egrep -i ' | ||
+ | | ||
+ | | ||
+ | </ | ||
+ | |||
+ | Job 1234 is billed against the it_nss account because it is in the it_nss workgroup partition. | ||
+ | |||
+ | To check what your workgroup has access to and the guaranteed resources on the Caviness refer to [[abstract: | ||