<booktoc/>
Managing Jobs on Caviness
Once a user has been able to submit jobs to the cluster – interactive or batch – the user will from time to time want to know what those jobs are doing. Is the job waiting in a queue for resources to become available, or is it executing? How long has the job been executing? How much CPU time or memory has the job consumed? Users can query Slurm for job information using the squeue
command while the job is still active in the Slurm. The squeue
command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document. Use squeue --help
or man squeue
commands on the login node to view a complete description of available options.
With no options provided, squeue
defaults to displaying a list of all jobs submitted by all the users on the cluster currently active in Slurm. This includes jobs that are waiting in a queue, jobs that are executing, and jobs that are in an error state. The list below is presented in a tabular format, with the following columns:
Column | Description |
---|---|
JOBID | Numerical identifier assigned when the job was submitted |
PARTITION | The partition to which the job is assigned |
NAME | The name assigned to the job |
USER | The owner of the job |
ST | Current state of the job (see next table) |
TIME | Either the time the job was submitted or the time the job began execution, depending on its state |
NODES | The number of nodes assigned to the job |
NODELIST(Reason) | The list of nodes on which the job is running or the reason for which the job is in its current state(other than running) |
The different states in which a job may exist are enumerated by the following codes:
State Code | Description |
---|---|
CA | Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated |
CG | Job is in the process of completing. Some processes on some nodes may still be active |
F | Job terminated with non-zero exit code or other failure condition |
PD | Job is awaiting resource allocation |
PR | Job terminated due to preemption |
R | Job currently has an allocation and is running |
RD | Job is held |
RQ | Completing job is being requeued |
RS | Job is about to change size |
S | Job has an allocation, but execution has been suspended and CPUs have been released for other jobs |
TO | Job terminated upon reaching its time limit |
There are many other possible job states codes which can be found in the official Slurm documentation.
Checking job status
squeue or ssqueue
Use the squeue
command to check the status of queued jobs. The ssqueue
version is a UD IT version of squeue
that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do.
Use squeue --help
or man squeue
commands on the login node to view a complete description of available options. The same options work with ssqueue
too. Some of the most often-used options are summarized here:
Option | Result |
---|---|
-j «job_id_list» | Displays information for specified job(s) |
-u «user_list» | Displays information for jobs associated with the specified user(s) |
--start | List estimated start time for queued jobs |
-t | Can be used to List the jobs that are in particular state(passed as argument) |
--Format | Customize output of squeue |
With no options provided, squeue
defaults to displaying a list of all jobs submitted by all the users on the cluster currently active in Slurm. This includes jobs that are waiting in a queue, jobs that are executing, and jobs that are in an error state.
[(it_css:traine)@login00 it_css]$ squeue JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 488072 ccei_biom P-irc rameswar CG 3-23:26:11 1 r01n41 496264 clj NOVA_Hcs delgorio PD 0:00 4 (QOSGrpCpuLimit) 496312 clj NOVA_Hcs delgorio PD 0:00 4 (QOSGrpCpuLimit) 502942 standard run.a8.p heil PD 0:00 4 (Resources) 502954 standard run.s8.p heil PD 0:00 3 (Priority) 502970 standard PEG44_EQ utkarsk PD 0:00 2 (Priority) 502821 standard R3_FineM yashar PD 0:00 2 (Priority) 502822 standard B9_ResCo yashar PD 0:00 2 (Priority) 502823 standard R3_ResCo yashar PD 0:00 2 (Priority) 502979 standard MOH_2Pro utkarsk PD 0:00 2 (Priority) 502967 standard run_442c heil PD 0:00 1 (Dependency) 502975 standard run.s10. heil PD 0:00 1 (Dependency) 496263 clj NOVA_Hcs delgorio R 2:33:10 4 r00n[25-26,36-37] 501784 elliottla fluidvis safa R 1-02:51:00 1 r01n53 498144 kukulka_l ph005crc txthoman R 2-21:01:27 1 r00n17 498436 biophysic CdI cyxu R 1-21:29:05 1 r01g04 . . .
The squeue
command can also be used to see the job status information of all the current jobs for a particular user by using -u
flag followed by the USER
. For example,
[(it_css:traine)@login00 ~]$ squeue -u traine JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 913 devel date traine PD 0:00 1 (QOSMaxJobsPerUserLimit) 912 standard openmp_j traine R 0:23 1 r00n45 911 devel sh traine R 1:14 1 r00n56 910 devel sh traine R 3:30 1 r00n56
shows only the jobs currently in Slurm for user traine
. Jobs 910, 911, and 912 are currently running. Job 913 is pending due to a "MaxJobsPerUser" limit enforced by the quality-of-service (QOS) level associated with the devel partition (see the section on Partitions for discussion of these limits).
Notably absent from both the standard and long format for squeue
are the resources (CPU cores, memory, GPUs) allocated to the job. Historically, Slurm was heavily focused on scheduling nodes themselves and not the component resources on those nodes, and this is reflected in the data it tends to display by default. The --Format
flag must be used to augment the displayed information:
[(it_css:traine)@login00 ~]$ squeue --Format="jobid,name,state,partition,account,username,numnodes,numtasks,numcpus,gres" JOBID NAME STATE PARTITION ACCOUNT USER NODES TASKS CPUS GRES 922 date PENDING devel it_nss frey 1 1 1 (null) 921 sh RUNNING devel it_nss frey 1 2 2 gpu:p100 918 openmp_job RUNNING standard it_nss frey 1 1 4 (null) 915 sh RUNNING devel it_nss frey 1 4 4 (null)
Job status is PD
When your job status is PD
it means your job is queued and waiting to execute. When you check with squeue
you might see something like this
[(it_css:traine)@traine it_css]$ squeue -u traine JOBID PARTITION NAME USER ST TIME NODES NODELIST(REASON) 1354 standard A3WF traine PD 0.00 1 (Priority)
Sometimes your job is stuck and remains in the PD
state and never starts running. You can use squeue
in combination with -j
and --start
to see the estimated start time of your pending job.
[(it_css:traine)@traine it_css]$ squeue -j 91487 --start JOBID PARTITION NAME USER ST START_TIME NODES NODELIST(REASON) 91487 standard hello_te traine PD 2018-07-09T13:09:40 2 (Priority)
sstat or ssstat
To check the status information of a running job/step, use the sstat
command with appropriate options. The ssstat
version is a UD IT version of sstat
that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do.
Some of the useful options are mentioned below, but keep in mind you can only use this command for jobs you own. The same options work with ssstat
too. A detailed list and explanation of options and column values results are explained in the man page,man sstat
.
Option | Result |
---|---|
-a «job_id_list» | Print all steps for the given job(s) when no step is specified |
-u «user_list» | Displays information for jobs associated with the specified user(s) |
-i | Predefined format to list the pids running for each job step. (JobId,Nodes,Pids) |
-p | output will be | delimited with a | at the end |
--format | Customize output of sstat |
[(it_css:traine)@login00 it_css]$ sstat -p --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 1354 25:02.000|0K|1.37M|5.93M|9.0|
scontrol
scontrol
is used for monitoring and modifying queued jobs, as well as holding and releasing jobs. One of its most powerful options is the scontrol show job
option with the JobID
.
Also, by default the per-job output has been limited to a name, submit/run time, and some state flags. Slurm maintains a far more extensive set of parameters for each job which can be viewed using the command scontrol
. For example,
[(it_css:traine)@login00 it_css]$ scontrol show job 1354 JobId=1354 JobName=A33_6_WF.qs UserId=traine(1111) GroupId=xxxx(10111) MCS_label=N/A Priority=2181 Nice=0 Account=thsu QOS=normal JobState=RUNNING Reason=None Dependency=(null) Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0 RunTime=05:04:27 TimeLimit=1-00:00:00 TimeMin=N/A SubmitTime=2018-08-14T09:56:15 EligibleTime=2018-08-14T09:56:15 StartTime=2018-08-14T09:56:16 EndTime=2018-08-15T09:56:16 Deadline=N/A PreemptTime=None SuspendTime=None SecsPreSuspend=0 LastSchedEval=2018-08-14T09:56:16 Partition=standard AllocNode:Sid=login01:19458 ReqNodeList=(null) ExcNodeList=(null) NodeList=r00n16 BatchHost=r00n16 NumNodes=1 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:* TRES=cpu=20,mem=60G,node=1,billing=49168 Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=* MinCPUsNode=1 MinMemoryCPU=3G MinTmpDiskNode=0 Features=(null) DelayBoot=00:00:00 Gres=(null) Reservation=(null) OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null) Command=/lustre/scratch/traine/AAd3F.qs WorkDir=/lustre/scratch/traine/ StdErr=/lustre/scratch/traine/ler/%x.o1354 StdIn=/dev/null StdOut=/lustre/scratch/traine/ler/%x.o1354 Power=
sacct or ssacct
To check the information about a job from history (i.e. a job that has already completed), sacct
can be used to fetch various details as long as you know the JobID
. The ssacct
version is a UD IT version of sacct
that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do.
[(it_css:traine)@login01 ~]$ sacct -j 10544 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 10544 Mg_FFS standard afwallace 10 PREEMPTED 0:0 10544.batch batch afwallace 10 CANCELLED 0:15 10544.extern extern afwallace 10 COMPLETED 0:0
In the above example, it indcates the job was preempted from the standard partition. Remember jobs in the standard partition are preempted to make way for workgroup specific jobs. If we want to check how much time a job was given to run, we can use the –format
option to display the TimeLimit
for the JobID 10544
.
[(it_css:traine)@login01 ~]$ sacct -j 10544 --format=TimeLimit Timelimit ---------- 7-00:00:00
From this we can see this job had 7 days to complete, the maximum time limit on Caviness, however because the job was running in the standard partition it was preempted after running 21 hours, 5 minutes and 10 seconds. Below is an example requesting additional information such as memory, number of nodes, list of nodes, etc.
[(it_css:traine)@login01 ~]$ sacct -j 10544 --format=user,jobid,jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist abrarq 10544 Mg_FFS standard PREEMPTED 7-00:00:00 2018-10-29T19:23:54 2018-10-30T16:29:04 21:05:10 1 10 r00n25 10544.batch batch CANCELLED 2018-10-29T19:23:54 2018-10-30T16:29:05 21:05:11 219520K 182124K 1 10 r00n25 10544.extern extern COMPLETED 2018-10-29T19:23:54 2018-10-30T16:23:50 20:59:56 4K 107904K 1 10 r00n25
Here is another example showing a job exited because enough memory was not specified for the job (may be indicated in Slurm output as oom-kill event(s)
which means out of memory), and in sacct
shows OUT_OF_ME+
which also means OUT OF MEMORY.
[traine@login00 ~]$ sacct -j 9176640 JobID JobName Partition Account AllocCPUS State ExitCode ------------ ---------- ---------- ---------- ---------- ---------- -------- 9176640 FVM standard zurakowski 1 OUT_OF_ME+ 0:125 9176640.bat+ batch zurakowski 1 OUT_OF_ME+ 0:125 9176640.ext+ extern zurakowski 1 COMPLETED 0:0 9176640.0 date zurakowski 1 COMPLETED 0:0
Using ssacct -j 17035767
instead is an example displaying the full text and clearly seeing OUT_OF_MEMORY
as follows
┌──────────────────────────────────────────────────────────────────────────────┐ │JobID │ JobName │ Partition │ Account │ AllocCPUS │ State │ │ │────────────────┼─────────┼───────────┼─────────┼───────────┼───────────────┼─│ │17035767 │ sbatch │ devel │ it_nss │ 1 │ OUT_OF_MEMORY │ │ │17035767.batch │ batch │ │ it_nss │ 1 │ OUT_OF_MEMORY │ │ │17035767.extern │ extern │ │ it_nss │ 1 │ COMPLETED │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │──────────────────────────────────────────────────────────────────────────────│ │ [Q]uit [P]rev/[N]ext page Page [L]eft/[R]ight [E]nd/[B]eginning of list│ └──────────────────────────────────────────────────────────────────────────────┘
and pressing R
to move right to see more, you see the following
┌──────────────────────────────────────────────────────────────────────────────┐ │ │ JobName │ Partition │ Account │ AllocCPUS │ State │ ExitCode│ │────────┼─────────┼───────────┼─────────┼───────────┼───────────────┼─────────│ │ │ sbatch │ devel │ it_nss │ 1 │ OUT_OF_MEMORY │ 0:125 │ │.batch │ batch │ │ it_nss │ 1 │ OUT_OF_MEMORY │ 0:125 │ │.extern │ extern │ │ it_nss │ 1 │ COMPLETED │ 0:0 │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │──────────────────────────────────────────────────────────────────────────────│ │ [Q]uit [P]rev/[N]ext page Page [L]eft/[R]ight [E]nd/[B]eginning of list│ └──────────────────────────────────────────────────────────────────────────────┘
Deleting a job
Use the scancel «job_id» command to remove pending and running jobs from the queue.
For example, to delete job 28000
scancel 28000
Available Resources
Although no one can claim ownership of particular nodes on Caviness, fair share policy has been implemented based on the purchases and using the quality-of-service (QOS) concept of Slurm. There is no ability to represent per-account/per-partition limits via associations. Instead, the Slurm QOS facility must be used. Each QOS has its own set of aggregate limits that apply regardless of the account, user, or partition involved with jobs. Access to each QOS can be granted by means of associations. Each QOS is usable only on those partitions that explicitly allow it.
Each investing-entity (workgroup) is associated with a QOS (same name as the workgroup) which is again mapped to the various workgroup-partitions based on the purchases.
One can determine the nodes that your workgroup has access to by using the scontrol
command. Since each workgroup has a partition on its name, the command to check the nodes for the investing-entity (workgroup) for it_css
would be
[traine@login01 bin]$ scontrol show partition it_css PartitionName=it_css AllowGroups=ALL AllowAccounts=it_css AllowQos=priority-access,normal AllocNodes=ALL Default=NO QoS=it_css DefaultTime=00:30:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO MaxNodes=UNLIMITED MaxTime=7-00:00:00 MinNodes=1 LLN=NO MaxCPUsPerNode=UNLIMITED Nodes=r00g[01-04],r00n[01-55],r01g[00-04],r01n[01-55],r02s[00-01] PriorityJobFactor=1 PriorityTier=10 RootOnly=NO ReqResv=NO OverSubscribe=YES:4 OverTimeLimit=NONE PreemptMode=REQUEUE State=UP TotalCPUs=4356 TotalNodes=121 SelectTypeParameters=NONE DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED TRESBillingWeights=CPU=1.0,Mem=1.0
Now to check the usable memory on each accessible node of a given investing-entity (workgroup), please use the qhost command.
sworkgroup
To get a summary of the partitions and resources that you and your workgroup have access to, one can use sworkgroup
(UD IT customized utility) command:
[(it_css:traine)@login01 ~]$ sworkgroup --help usage: sworkgroup [-h] [--workgroup group] [--limits] [--noheader] [--human-readable] [--parseable] Show Slurm partitions and limits for a workgroup optional arguments: -h, --help show this help message and exit --workgroup group, -g group display partitions available to a specific workgroup --limits, -l show TRES limits for each partition --noheader, -N do not display column headers on output --human-readable, -H show TRES in a more human-readable format --parseable show as parseable lines rather than a tabular format
TRES stands for Slurm Trackable RESources that are implemented in the scheduler: CPU usage, memory usage, time. This is important as the usage of the cluster factors TRES into the fairshare calculation. Also, TRES allows the scheduler to charge back users for how much they have used of their particular resources available on the cluster.
[(it_css:traine)@login01 ~]$ sworkgroup --workgroup=it_css --limits Partition Per user Per job Per workgroup ---------+--------+-------+----------------------------- devel 2 jobs cpu=4 it_css cpu=72,node=2,gres/gpu:p100=1 standard cpu=720 cpu=360
Host Status, Literally (sinfo or ssinfo)
Slurm also features a sinfo
command that can be used to report status of all the nodes of cluster, period, no job information or information will be displayed. The ssinfo
version is a a UD IT version of sinfo
that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do. The same options for sinfo
work with sssinfo
too.
[(it_css:traine)@login00 it_css]$ sinfo --long --Node | more Thu May 26 11:55:03 2022 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON r00g00 1 devel idle 72 2:36:1 126976 910149 1 Gen1,E5- none r00g01 1 it_css mixed 36 2:18:1 126976 910149 10000 Gen1,E5- none r00g01 1 it_nss mixed 36 2:18:1 126976 910149 10000 Gen1,E5- none r00g01 1 kirby mixed 36 2:18:1 126976 910149 10000 Gen1,E5- none r00g01 1 standard* mixed 36 2:18:1 126976 910149 10000 Gen1,E5- none r00g01 1 reserved mixed 36 2:18:1 126976 910149 10000 Gen1,E5- none r00g02 1 it_css allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g02 1 it_nss allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g02 1 biophysics allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g02 1 standard* allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g02 1 reserved allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g03 1 it_css mixed 36 2:18:1 514048 910149 30000 Gen1,E5- none r00g03 1 it_nss mixed 36 2:18:1 514048 910149 30000 Gen1,E5- none r00g03 1 ececis_research mixed 36 2:18:1 514048 910149 30000 Gen1,E5- none r00g03 1 standard* mixed 36 2:18:1 514048 910149 30000 Gen1,E5- none r00g03 1 reserved mixed 36 2:18:1 514048 910149 30000 Gen1,E5- none r00g04 1 it_css allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g04 1 it_nss allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g04 1 biophysics allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g04 1 standard* allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g04 1 reserved allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none . . r04n76 1 kirby mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 kuehl_group mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 jayaraman_lab mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 dditoro mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 it_css mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 it_nss mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 reserved mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04n76 1 standard* mixed 40 2:20:1 191488 910149 10 Gen2.1,G none r04s00 1 lianglab mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s00 1 ecosys mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s00 1 it_css mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s00 1 it_nss mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s00 1 reserved mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s00 1 standard* mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s01 1 lianglab mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s01 1 ecosys mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s01 1 it_css mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s01 1 it_nss mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s01 1 reserved mixed 40 2:20:1 385024 0 100000 Gen2.1,G none r04s01 1 standard* mixed 40 2:20:1 385024 0 100000 Gen2.1,G none
The command has other options that will be discussed elsewhere; view the sinfo
man page for a description of all options available.
Memory refers to the size of memory for each node in MB.The column S:C:T refers to the extended processor information: number of sockets, cores, threads (S:C:T) per node, whereas TMP_DISK refers to the size of temporary disk space per node in megabytes. WEIGHT column refers to scheduling weight of the nodes.
To fetch the information of one particular node, add -n
argument followed by node number.
[traine@login01.caviness ~]$ sinfo --long --Node -n r01n55 Thu May 26 11:52:36 2022 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON r01n55 1 it_css mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 it_nss mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 afwallace mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 cbbi mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 ccei_biomass mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 cieg_core mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 clj mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 dditoro mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 disasters mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 elliottlab mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 gleghorn mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 jayaraman_lab mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 kukulka_lab mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 mcconnell mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 oceans mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 orb mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 roberts_lab mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 thsu mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 ud_zlab mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 udgeotech mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 zurakowski mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 standard* mixed 36 2:18:1 126976 910149 10 Gen1,E5- none r01n55 1 reserved mixed 36 2:18:1 126976 910149 10 Gen1,E5- none
To fetch the information of all the nodes for a particular partition, use the -p
argument followed the partition.
[traine@login01.caviness ~]$ sinfo --long --Node -p biophysics Wed Apr 19 11:32:53 2023 NODELIST NODES PARTITION STATE CPUS S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON r00g02 1 biophysics allocated 36 2:18:1 256000 910149 20000 Gen1,E5- none r00g04 1 biophysics idle 36 2:18:1 256000 910149 20000 Gen1,E5- none r01g02 1 biophysics idle 36 2:18:1 256000 910149 20000 Gen1,E5- none r01g03 1 biophysics idle 36 2:18:1 256000 910149 20000 Gen1,E5- none r01g04 1 biophysics idle 36 2:18:1 256000 910149 20000 Gen1,E5- none
Here is the output from ssinfo –long –Node -p biophysics
to see the full text especially in the AVAIL_FEATURES column
┌───────────────────────────────────────────────────────────────────────────┐ │Wed Apr 19 11:26:28 2023 │ │──────────────────────── │ │NODELIST │ NODES │ PARTITION │ STATE │ CPUS │ S:C:T ││ │r00g02 │ 1 │ biophysics │ allocated │ 36 │ 2:18:1 ││ │r00g04 │ 1 │ biophysics │ idle │ 36 │ 2:18:1 ││ │r01g02 │ 1 │ biophysics │ idle │ 36 │ 2:18:1 ││ │r01g03 │ 1 │ biophysics │ idle │ 36 │ 2:18:1 ││ │r01g04 │ 1 │ biophysics │ idle │ 36 │ 2:18:1 ││ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │───────────────────────────────────────────────────────────────────────────│ │ [Q]uit [P]rev/[N]ext page Page [L]eft/[R]ight [E]nd/[B]eginning of l│ └───────────────────────────────────────────────────────────────────────────┘
and pressing R
to move right to see more, you see the following
┌───────────────────────────────────────────────────────────────────────────┐ │ │ │ │ │S:C:T │ MEMORY │ TMP_DISK │ WEIGHT │ AVAIL_FEATURES │ REASON│ │2:18:1 │ 256000 │ 910149 │ 20000 │ Gen1,E5-2695,E5-2695v4,256GB │ none │ │2:18:1 │ 256000 │ 910149 │ 20000 │ Gen1,E5-2695,E5-2695v4,256GB │ none │ │2:18:1 │ 256000 │ 910149 │ 20000 │ Gen1,E5-2695,E5-2695v4,256GB │ none │ │2:18:1 │ 256000 │ 910149 │ 20000 │ Gen1,E5-2695,E5-2695v4,256GB │ none │ │2:18:1 │ 256000 │ 910149 │ 20000 │ Gen1,E5-2695,E5-2695v4,256GB │ none │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │ │───────────────────────────────────────────────────────────────────────────│ │ [Q]uit [P]rev/[N]ext page Page [L]eft/[R]ight [E]nd/[B]eginning of l│ └───────────────────────────────────────────────────────────────────────────┘
UD IT Status Commands
UD IT has created additional status summary commands that build on the squeue
and sacct
commands provided by Slurm. In most cases these commands are conveniences that fill-in some of the options that were summarized above for the user. All commands display a terse summary of their options if the -h
or –help
option is provided with the command.
sjobs
The sjobs
command displays job status in a more compact format:
[traine@login01 ~]$ sjobs -h usage: sjobs [-h] [-a] [-G] [-g <workgroup>] [-u <username>] [-t] [-H] [--parseable] [-d <string>] Display information for running and queued jobs optional arguments: -h, --help show this help message and exit job selection options: -a, --all show all users (not just yourself) -G, --current-group show jobs for users in your current workgroup -g <workgroup>, --group <workgroup> show jobs for users in the given workgroup (can be used multiple times) -u <username>, --user <username> show jobs for the given user (can be used multiple times) output options: -t, --totals show total jobs, users, groups, cpus, nodes, and tasks -H, --no-header do not show column headers --parseable do not display in a tabular format -d <string>, --delimiter <string> when --parseable is selected, separate columns with this string If no job selection options are provided then jobs for the current user will be displayed.
[(it_css:traine)@login01 ~]$ sjobs JOBID USER STATE JOBNAME GROUP NCPUS NNODES NTASKS ----- -------- ------- ---------- ------ ----- ------ ------ 1904 traine RUNNING arrayJob it_css 1 1 1 1903 traine RUNNING openmp_job it_css 8 1 1
By default, only jobs owned by the user issuing the command are displayed. The -g
option followed by the group name displays jobs for users who are members of that specific group:
[(it_css:traine)@login00 ~]$ sjobs -g ud_zlab JOBID USER STATE JOBNAME GROUP NCPUS NNODES NTASKS ------ ------- ------- ----------- ------- ----- ------ ------ 351369 fazle PENDING polySiC ud_zlab 128 4 128 349950 zhangzc RUNNING L9.0.1.0.0 ud_zlab 32 2 32 349944 zhangzc RUNNING L5z.2.1.0.0 ud_zlab 32 2 32 349929 zhangzc RUNNING L5z.0.1.0.0 ud_zlab 32 2 32
The -a
option displays jobs for all cluster users.
sworkgroup
Another UD IT command that displays the partitions and resources that a workgroup have access to on the Caviness cluster. This has been already discussed on the same page under Available Resources
qhost
qhost
is a wrapper written for Caviness to consolidate the information collected from Slurm commands and joined together to display the host/node information in a similar fashion to that of qhost
on Farber.
[traine@login01 ~]$ qhost --help usage: qhost [-help] [-h <hostlist>] [-ncb] [-j] [-u <username>] [--help] [--hosts <hostlist>] [--jobs] [--users <userlist>] [--no-node-sort] [--std-host-topo] Display host information akin to Grid Engine qhost original options: options inherited from Grid Engine qhost -help print this help -h <hostlist> display only selected hosts -ncb suppress host topology based information -j display jobs running on the host(s) -u <username> show only jobs for user extended options: additional options not present in Grid Engine qhost --help alternate to -help --hosts <hostlist> alternate to -h --jobs alternate to -j --users <userlist> alternate to -u --no-node-sort, -S do not sort the list of nodes by name before displaying --std-host-topo, -T show host topology as sockets, cores-per-socket, and threads-per-core A <hostlist> is one or more node names or name patterns (e.g. r00n[00-24]) separated by commas. A <userlist> is one or more user names separated by commas
The following example shows the specifications for node r01n047
and this is helpful to understand that the actual usable memory on r01n047
is 124.0G versus the physical memory which is 128G. This is important when specifying the amount of memory as a option especially when using a workgroup partition. See details about managing jobs using command options for sbatch: Memory
[traine@login01 ~]$ qhost -h r01n47 HOSTNAME ARCH NCPU NSOC NCOR NTHR NLOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- r01n47 E5-2695v4 36 2 36 36 0.99 124.0G 36.0G 0.0 0.0
Adding option -j
will now show the jobs running on node r01n47
in this example.
[traine@login01 ~]$ qhost -h r01n47 -j HOSTNAME ARCH NCPU NSOC NCOR NTHR NLOAD MEMTOT MEMUSE SWAPTO SWAPUS ---------------------------------------------------------------------------------------------- r01n47 E5-2695v4 36 2 36 36 0.42 125.8G 0.0M 0.0 0.0 job-ID prior name user state submit/start at queue master ja-task-ID ----------------------------------------------------------------------------------------------------------------- 139102 2788 BLSSM_NU1f197 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139103 2788 BLSSM_NU1f198 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139104 2788 BLSSM_NU1f199 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139105 2788 BLSSM_NU1f200 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139106 2788 BLSSM_NU1f201 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139107 2788 BLSSM_NU1f202 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139108 2788 BLSSM_NU1f203 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139109 2788 BLSSM_NU1f204 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139110 2788 BLSSM_NU1f205 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139111 2788 BLSSM_NU1f206 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139112 2788 BLSSM_NU1f207 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139113 2788 BLSSM_NU1f208 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139114 2788 BLSSM_NU1f209 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139115 2788 BLSSM_NU1f210 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER 139116 2788 BLSSM_NU1f211 hpcguest1015 R 2018-12-05T10:19:28 standard MASTER
squota
squota
is similar to the qquota
command available on Mills and Farber. squota
displays the current utilization of guaranteed (purchased) resources for the workgroup.
[traine@login00 slurm]$ squota --help usage: squota [-h] [--json] [--yaml] [-g <workgroup>] Display workgroup resource quota usage optional arguments: -h, --help show this help message and exit --json output in JSON format --yaml output in YAML format -g <workgroup>, --group <workgroup> display a specific workgroup (by name); without this flag, the Unix group of the calling process is used
[traine@login00 ~]$ squota -g ececis_research resource used limit pct ~~~~~~~~~~~~~ ~~~~~~~ ~~~~~~~ ~~~~~~ node 13 mem 1732608 1927168 89.9% gres/gpu 4 4 100.0% gres/gpu:v100 0 2 0.0% gres/gpu:p100 1 2 50.0% cpu 152 152 100.0%
This particular group purchased 4 GPUs (2 gpu:v100
and 2 gpu:p100
). All 4 GPUs purchased are currently in use (3 did not specify a GPU model as part of the submission, and 1 job did gpu:p100
).
squota
uses gres/gpu
to represent the total number of GPUs purchased by the workgroup (or their total number of guaranteed GPU resources) and total usage, while the listings gres/gpu:v100
, gres/gpu:p100
and gres/gpu:t4
represents a breakdown of what GPU models were purchased by the workgroup and usage based on specifying a GPU model as part of the job submission.
[traine@login01 ~]$ squota -g ccei_biomass --yaml cpu: limit: 1596 percentage: 50.68922305764411 usage: 809 gres/gpu: limit: 4 percentage: 0.0 usage: 0 gres/gpu:t4: limit: 2 percentage: 0.0 usage: 0 gres/gpu:v100: limit: 2 percentage: 0.0 usage: 0 mem: limit: 9268224 percentage: 45.50878355982764 usage: 4217856 node: usage: 25
This particular group also purchased 4 GPUs (2 gpu:t4
and 2 gpu:v100
). All 4 GPUs purchased are not being used.
qstatgrp
qstatgrp
makes use of the Slurm commands squeue
and sacctmgr
to display the current utilization of resources within a workgroup. Using the -h
or –help
argument will display the list of operations and options that can be used with qstatgrp.
[(it_css:traine)@login00 ~]$ qstatgrp --help usage: qstatgrp [-h] [-g <gid>] [-o] [-j] Display per-workgroup job information akin to Grid Engine qstatgrp optional arguments: -h, --help show this help message and exit -g <gid>, --group <gid> restrict to jobs running under the given workgroup (can be specified multiple times) -o, --group-only display for the workgroup partition only -j, --jobs show all jobs, not an aggregate summary of them
With no options specified qstatgrp
will list a summary of current utilization of resources for the current workgroup. To see the current utilization for a particular workgroup use the -g
option.
[traine@login01 ~]$ qstatgrp -g ccm_gillespi PARTITION NODES CPUS MAX NODES MAX CPUS ------------------------------------------------------------------------ ccm_gillespi 6 216 11 396 standard 3 72 - TOTAL 9 288
And to see the individual jobs and resources being used for the default workgroup, use the -j
option or add the workgroup to see for a particular workgroup.
[(it_css:traine)@login00 ~]$ qstatgrp -g ccm_gillespi -j PARTITION JOBID OWNER PRIORITY STATE NODES CPUS MEM GRES -------------------------------------------------------------------------------------------------------------------------------------- standard 350230 jyeon 2198 PD 6 95 8G standard 350231 jyeon 2198 PD 7 100 8G standard 351092 sanjib 2207 PD 1 6 1G standard 351279 daksha 2165 PD 5 80 8G standard 351280 daksha 2165 PD 6 90 8G standard 350217 jyeon 2121 R 24 90 8G standard 350202 jyeon 2120 R 19 100 8G standard 350104 sanjib 2142 R 1 1 1G standard 344337 daksha 2686 R 6 100 8G standard 344336 daksha 2650 R 22 100 8G standard 344335 daksha 2594 R 17 95 8G standard 344334 daksha 2582 R 21 95 8G standard 342446 jyeon 2792 R 14 100 8G standard 342443 jyeon 2678 R 20 95 8G standard 342356 sanjib 2529 R 5 12 1G standard 342353 sanjib 2522 R 3 6 1G standard 334339 sanjib 2670 R 7 108 1G
spreempted
Due to a certain level of inconsistency in Slurm's error logging, preempted jobs are notified as FAILED
due to improper handling of SIGCONT/SIGTERM signals which leads to default behavior of SIGTERM that makes jobs to immediately exit rather than waiting for a grace period of 5 minutes (technically should happen when job is preempted). Therefore to really determine if a job has been preempted or not, UD IT has developed a command which tests various conditions and concludes on preemption.
Also please note that, preemption occurs only to the jobs are being submitted to the standard
partition.
[traine@login01 ~]$ spreempted -h usage: spreempted [-h] [--verbose] [--quiet] [--show-jobid] [--sort-properties] [--jobid <job-id>{,<job-id>..}] Determine if jobs were preempted optional arguments: -h, --help show this help message and exit --verbose, -v Emit additional information --quiet, -q Do not summarize preemptions, just return non-zero result code if any were preempted --show-jobid, -s Always prefix output lines with job id --sort-properties For verbose output, show properties in alphabetized order --jobid <job-id>{,<job-id>..}, -j <job-id>{,<job-id>..} Slurm job id to check; can be used multiple times to check more than one job. For array jobs, use the syntax #_# for individual array indices
[traine@login01 ~]$ spreempted -j 410289 preempted, did not reach grace period limit