Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
abstract:caviness:runjobs:queues [2021-08-12 09:29] – [The devel partition] anita | abstract:caviness:runjobs:queues [2023-05-30 13:48] – [The job queues (partitions) on Caviness] anita | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | |||
- | ====== The job queues (partitions) on Caviness ====== | ||
- | |||
- | The Caviness cluster has several kinds of partition (queue) available in which to run jobs: | ||
- | |||
- | ^Kind^Description^ | ||
- | |standard|The default partition if no '' | ||
- | |devel|A partition with very short runtime limits and small resource limits; important to use for any development using compilers| | ||
- | |workgroup-specific|Partitions associated with specific kinds of compute equipment in the cluster purchased by a research group <<// | ||
- | |||
- | ===== The standard partition ===== | ||
- | |||
- | This partition is the default when no '' | ||
- | |||
- | The idea of the standard partition is somewhat like the combination of the standby and spillover queues concepts in the earlier clusters. | ||
- | |||
- | Limits to jobs submitted to this partition are: | ||
- | * a maximum runtime of 7 days (default is 30 minutes) | ||
- | * Maximum number of CPUs per job = 360 | ||
- | * Maximum CPUs per user = 720 | ||
- | |||
- | The standard partition is subject to job preemption (killed) because it allows a job submitted to a workgroup-specific partition to release resources tied-up by jobs in the standard partition. In summary, jobs in the standard partition will be preempted (killed with 5 minute grace period) to release resources for the workgroup-specific partition job. For more information on how to handle your job if it is preempted, please refer to [[abstract: | ||
- | |||
- | ===== The devel partition ===== | ||
- | |||
- | This partition is used for short-lived jobs with minimal resource needs. | ||
- | * Performing compiles of code for projects that otherwise can't be done on the login (head) node and to make sure you are allocated a compute node with the development tools, libraries, etc. which are needed for compilers. | ||
- | * Running test jobs to vet programs or changes to programs | ||
- | * Testing correctness of program parallelization | ||
- | * Interactive sessions | ||
- | * Removing files especially if cleaning up many files and directories in '' | ||
- | Because performance is not critical for these use cases, the nodes serviced by the '' | ||
- | |||
- | Limits to jobs submitted to this partition are: | ||
- | * a maximum runtime of 2 hours (default is 30 minutes) | ||
- | * each user can submit up to 2 jobs | ||
- | * each job can use up to 4 cores on a single node | ||
- | |||
- | For example: | ||
- | <code bash> | ||
- | [traine@login01 ~]$ workgroup -g it_css | ||
- | [(it_css: | ||
- | Mon Jul 23 15:25:07 EDT 2018 | ||
- | </ | ||
- | |||
- | One copy of the '' | ||
- | <code bash> | ||
- | [traine@login01 ~]$ workgroup -g it_css | ||
- | [(it_css: | ||
- | salloc: Granted job allocation 940 | ||
- | salloc: Waiting for resource configuration | ||
- | salloc: Nodes r00n56 are ready for job | ||
- | [traine@r00n56 ~]$ echo $SLURM_CPUS_ON_NODE | ||
- | 2 | ||
- | </ | ||
- | |||
- | ===== The workgroup-specific partitions ===== | ||
- | |||
- | The use of // | ||
- | |||
- | Limits to jobs submitted to workgroup-specific partitions: | ||
- | * a maximum runtime of 7 days (default is 30 minutes) | ||
- | * per-workgroup resource limits (QOS) based on | ||
- | * how many nodes your research group (workgroup) purchased (node=#) | ||
- | * how many cores your research group (workgroup) purchased (cpu=#) | ||
- | * how many GPUs your research group (workgroup) purchased (gres/ | ||
- | |||
- | For example: | ||
- | |||
- | <code bash> | ||
- | $ workgroup -g it_nss | ||
- | $ sbatch --verbose --partition=_workgroup_ … | ||
- | : | ||
- | sbatch: partition | ||
- | : | ||
- | Submitted batch job 1234 | ||
- | $ scontrol show job 1234 | egrep -i ' | ||
- | | ||
- | | ||
- | </ | ||
- | |||
- | Job 1234 is billed against the it_nss account because it is in the it_nss workgroup partition. | ||
- | |||
- | To check what your workgroup has access to and the guaranteed resources on the Caviness refer to [[abstract: | ||