Managing Jobs on DARWIN

Managing Jobs on DARWIN

Once a user has been able to submit jobs to the cluster – interactive or batch – the user will from time to time want to know what those jobs are doing. Is the job waiting in a queue for resources to become available, or is it executing? How long has the job been executing? How much CPU time or memory has the job consumed? Users can query Slurm for job information using the squeue command while the job is still active in the Slurm. The squeue command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document. Use squeue --help or man squeue commands on the login node to view a complete description of available options.

With no options provided, squeue defaults to displaying a list of all jobs submitted by all the users on the cluster currently active in Slurm. This includes jobs that are waiting in a queue, jobs that are executing, and jobs that are in an error state. The list below is presented in a tabular format, with the following columns:

Column	Description
JOBID	Numerical identifier assigned when the job was submitted
PARTITION	The partition to which the job is assigned
NAME	The name assigned to the job
USER	The owner of the job
ST	Current state of the job (see next table)
TIME	Either the time the job was submitted or the time the job began execution, depending on its state
NODES	The number of nodes assigned to the job
NODELIST(Reason)	The list of nodes on which the job is running or the reason for which the job is in its current state(other than running)

The different states in which a job may exist are enumerated by the following codes:

State Code	Description
`CA`	Job was explicitly cancelled by the user or system administrator. The job may or may not have been initiated
`CG`	Job is in the process of completing. Some processes on some nodes may still be active
`F`	Job terminated with non-zero exit code or other failure condition
`PD`	Job is awaiting resource allocation
`PR`	Job terminated due to preemption
`R`	Job currently has an allocation and is running
`RD`	Job is held
`RQ`	Completing job is being requeued
`RS`	Job is about to change size
`S`	Job has an allocation, but execution has been suspended and CPUs have been released for other jobs
`TO`	Job terminated upon reaching its time limit

There are many other possible job states codes which can be found in the official Slurm documentation.

Checking job status

squeue or ssqueue

Use the squeue command to check the status of queued jobs. The ssqueue version is a UD IT version of squeue that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do.

Use squeue --help or man squeue commands on the login node to view a complete description of available options. The same options work with ssqueue too. Some of the most often-used options are summarized here:

Option	Result
`-j` «job_id_list»	Displays information for specified job(s)
`-u` «user_list»	Displays information for jobs associated with the specified user(s)
`--start`	List estimated start time for queued jobs
`-t`	Can be used to List the jobs that are in particular state(passed as argument)
`--Format`	Customize output of squeue

[(it_css:traine)@login00 it_css]$ squeue
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
            488072 ccei_biom    P-irc rameswar CG 3-23:26:11      1 r01n41
            496264       clj NOVA_Hcs delgorio PD       0:00      4 (QOSGrpCpuLimit)
            496312       clj NOVA_Hcs delgorio PD       0:00      4 (QOSGrpCpuLimit)
            502942  standard run.a8.p     heil PD       0:00      4 (Resources)
            502954  standard run.s8.p     heil PD       0:00      3 (Priority)
            502970  standard PEG44_EQ  utkarsk PD       0:00      2 (Priority)
            502821  standard R3_FineM   yashar PD       0:00      2 (Priority)
            502822  standard B9_ResCo   yashar PD       0:00      2 (Priority)
            502823  standard R3_ResCo   yashar PD       0:00      2 (Priority)
            502979  standard MOH_2Pro  utkarsk PD       0:00      2 (Priority)
            502967  standard run_442c     heil PD       0:00      1 (Dependency)
            502975  standard run.s10.     heil PD       0:00      1 (Dependency)
            496263       clj NOVA_Hcs delgorio  R    2:33:10      4 r00n[25-26,36-37]
            501784 elliottla fluidvis     safa  R 1-02:51:00      1 r01n53
            498144 kukulka_l ph005crc txthoman  R 2-21:01:27      1 r00n17
            498436 biophysic      CdI     cyxu  R 1-21:29:05      1 r01g04
 
            .
            .
            .

The squeue command can also be used to see the job status information of all the current jobs for a particular user by using -u flag followed by the USER. For example,

[(it_css:traine)@login00 ~]$ squeue -u traine
             JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
               913  standard     date     traine PD       0:00      1 (QOSMaxJobsPerUserLimit)
               912  standard openmp_j     traine  R       0:23      1 r00n45
               911  standard       sh     traine  R       1:14      1 r00n56
               910  standard       sh     traine  R       3:30      1 r00n56

shows only the jobs currently in Slurm for user traine. Jobs 910, 911, and 912 are currently running. Job 913 is pending due to a "MaxJobsPerUser" limit enforced by the quality-of-service (QOS) level associated with the devel partition (see the section on Partitions for discussion of these limits).

Notably absent from both the standard and long format for squeue are the resources (CPU cores, memory, GPUs) allocated to the job. Historically, Slurm was heavily focused on scheduling nodes themselves and not the component resources on those nodes, and this is reflected in the data it tends to display by default. The --Format flag must be used to augment the displayed information:

[(it_css:traine)@login00 ~]$ squeue --Format="jobid,name,state,partition,account,username,numnodes,numtasks,numcpus,gres"
 
JOBID               NAME                STATE               PARTITION           ACCOUNT             USER                NODES               TASKS               CPUS                GRES                
922                 date                PENDING             devel               it_nss              frey                1                   1                   1                   (null)              
921                 sh                  RUNNING             devel               it_nss              frey                1                   2                   2                   gpu:p100            
918                 openmp_job          RUNNING             standard            it_nss              frey                1                   1                   4                   (null)              
915                 sh                  RUNNING             devel               it_nss              frey                1                   4                   4                   (null)

Job status is PD

When your job status is PD it means your job is queued and waiting to execute. When you check with squeue you might see something like this

[(it_css:traine)@traine it_css]$ squeue -u traine
     JOBID PARTITION     NAME     USER ST       TIME  NODES NODELIST(REASON)
      1354  standard A3WF    traine  PD         0.00      1 (Priority)

Sometimes your job is stuck and remains in the PD state and never starts running. You can use squeue in combination with -j and --start to see the estimated start time of your pending job.

[(it_css:traine)@traine it_css]$ squeue -j 91487 --start
  JOBID PARTITION     NAME     USER  ST           START_TIME  NODES NODELIST(REASON)
  91487 standard   hello_te     traine  PD        2018-07-09T13:09:40       2 (Priority)

sstat or ssstat

To check the status information of a running job/step, use the sstat command with appropriate options. The ssstat version is a UD IT version of sstat that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do.

Some of the useful options are mentioned below, but keep in mind you can only use this command for jobs you own. The same options work with ssstat too. A detailed list and explanation of options and column values results are explained in the man page,man sstat.

Option	Result
`-a` «job_id_list»	Print all steps for the given job(s) when no step is specified
`-u` «user_list»	Displays information for jobs associated with the specified user(s)
`-i`	Predefined format to list the pids running for each job step. (JobId,Nodes,Pids)
`-p`	output will be `\|` delimited with a `\|` at the end
`--format`	Customize output of sstat

[(it_css:traine)@login00 it_css]$ sstat -p --format=AveCPU,AvePages,AveRSS,AveVMSize,JobID -j 1354
     25:02.000|0K|1.37M|5.93M|9.0|

scontrol

scontrol is used for monitoring and modifying queued jobs, as well as holding and releasing jobs. One of its most powerful options is the scontrol show job option with the JobID.

Also, by default the per-job output has been limited to a name, submit/run time, and some state flags. Slurm maintains a far more extensive set of parameters for each job which can be viewed using the command scontrol. For example,

[(it_css:traine)@login00 it_css]$ scontrol show job 1354
JobId=1354 JobName=A33_6_WF.qs
   UserId=traine(1111) GroupId=xxxx(10111) MCS_label=N/A
   Priority=2181 Nice=0 Account=thsu QOS=normal
   JobState=RUNNING Reason=None Dependency=(null)
   Requeue=0 Restarts=0 BatchFlag=1 Reboot=0 ExitCode=0:0
   RunTime=05:04:27 TimeLimit=1-00:00:00 TimeMin=N/A
   SubmitTime=2018-08-14T09:56:15 EligibleTime=2018-08-14T09:56:15
   StartTime=2018-08-14T09:56:16 EndTime=2018-08-15T09:56:16 Deadline=N/A
   PreemptTime=None SuspendTime=None SecsPreSuspend=0
   LastSchedEval=2018-08-14T09:56:16
   Partition=standard AllocNode:Sid=login01:19458
   ReqNodeList=(null) ExcNodeList=(null)
   NodeList=r00n16
   BatchHost=r00n16
   NumNodes=1 NumCPUs=20 NumTasks=20 CPUs/Task=1 ReqB:S:C:T=0:0:*:*
   TRES=cpu=20,mem=60G,node=1,billing=49168
   Socks/Node=* NtasksPerN:B:S:C=0:0:*:* CoreSpec=*
   MinCPUsNode=1 MinMemoryCPU=3G MinTmpDiskNode=0
   Features=(null) DelayBoot=00:00:00
   Gres=(null) Reservation=(null)
   OverSubscribe=OK Contiguous=0 Licenses=(null) Network=(null)
   Command=/lustre/scratch/traine/AAd3F.qs
   WorkDir=/lustre/scratch/traine/
   StdErr=/lustre/scratch/traine/ler/%x.o1354
   StdIn=/dev/null
   StdOut=/lustre/scratch/traine/ler/%x.o1354
   Power=

sacct or ssacct

To check the information about a job from history (i.e. a job that has already completed), sacct can be used to fetch various details as long as you know the JobID. The ssacct version is a UD IT version of sacct that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do.

[(it_css:traine)@login01 ~]$ sacct -j 10544
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
10544            Mg_FFS   standard  afwallace         10  PREEMPTED      0:0
10544.batch       batch             afwallace         10  CANCELLED     0:15
10544.extern     extern             afwallace         10  COMPLETED      0:0

In the above example, it indcates the job was preempted from the standard partition. Remember jobs in the standard partition are preempted to make way for workgroup specific jobs. If we want to check how much time a job was given to run, we can use the –format option to display the TimeLimit for the JobID 10544.

[(it_css:traine)@login01 ~]$ sacct -j 10544 --format=TimeLimit
 Timelimit
----------
7-00:00:00

From this we can see this job had 7 days to complete, the maximum time limit on Caviness, however because the job was running in the standard partition it was preempted after running 21 hours, 5 minutes and 10 seconds. Below is an example requesting additional information such as memory, number of nodes, list of nodes, etc.

[(it_css:traine)@login01 ~]$ sacct -j 10544 --format=user,jobid,jobname,partition,state,time,start,end,elapsed,MaxRss,MaxVMSize,nnodes,ncpus,nodelist
 abrarq   10544            Mg_FFS   standard  PREEMPTED 7-00:00:00 2018-10-29T19:23:54 2018-10-30T16:29:04   21:05:10                              1         10          r00n25
          10544.batch       batch             CANCELLED            2018-10-29T19:23:54 2018-10-30T16:29:05   21:05:11    219520K    182124K        1         10          r00n25
          10544.extern     extern             COMPLETED            2018-10-29T19:23:54 2018-10-30T16:23:50   20:59:56         4K    107904K        1         10          r00n25

Here is another example showing a job exited because enough memory was not specified for the job (may be indicated in Slurm output as oom-kill event(s) which means out of memory), and in sacct shows OUT_OF_ME+ which also means OUT OF MEMORY.

[traine@login00 ~]$ sacct -j 9176640
       JobID    JobName  Partition    Account  AllocCPUS      State ExitCode
------------ ---------- ---------- ---------- ---------- ---------- --------
9176640             FVM   standard zurakowski          1 OUT_OF_ME+    0:125
9176640.bat+      batch            zurakowski          1 OUT_OF_ME+    0:125
9176640.ext+     extern            zurakowski          1  COMPLETED      0:0
9176640.0          date            zurakowski          1  COMPLETED      0:0

Using ssacct -j 17035767 instead is an example displaying the full text and clearly seeing OUT_OF_MEMORY as follows

┌──────────────────────────────────────────────────────────────────────────────┐
│JobID           │ JobName │ Partition │ Account │ AllocCPUS │ State         │ │
│────────────────┼─────────┼───────────┼─────────┼───────────┼───────────────┼─│
│17035767        │ sbatch  │ devel     │ it_nss  │         1 │ OUT_OF_MEMORY │ │
│17035767.batch  │ batch   │           │ it_nss  │         1 │ OUT_OF_MEMORY │ │
│17035767.extern │ extern  │           │ it_nss  │         1 │ COMPLETED     │ │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│──────────────────────────────────────────────────────────────────────────────│
│ [Q]uit   [P]rev/[N]ext page   Page [L]eft/[R]ight   [E]nd/[B]eginning of list│
└──────────────────────────────────────────────────────────────────────────────┘

and pressing R to move right to see more, you see the following

┌──────────────────────────────────────────────────────────────────────────────┐
│        │ JobName │ Partition │ Account │ AllocCPUS │ State         │ ExitCode│
│────────┼─────────┼───────────┼─────────┼───────────┼───────────────┼─────────│
│        │ sbatch  │ devel     │ it_nss  │         1 │ OUT_OF_MEMORY │ 0:125   │
│.batch  │ batch   │           │ it_nss  │         1 │ OUT_OF_MEMORY │ 0:125   │
│.extern │ extern  │           │ it_nss  │         1 │ COMPLETED     │ 0:0     │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│                                                                              │
│──────────────────────────────────────────────────────────────────────────────│
│ [Q]uit   [P]rev/[N]ext page   Page [L]eft/[R]ight   [E]nd/[B]eginning of list│
└──────────────────────────────────────────────────────────────────────────────┘

Deleting a job

Use the scancel «job_id» command to remove pending and running jobs from the queue.

For example, to delete job 28000

  scancel 28000

Available Resources

DARWIN utilizes a fair share policy implemented based on allocations and using the quality-of-service (QOS) concept of Slurm. These allocations are represented per-workgroup/per-partition limits via Slurm QOS facility. Each QOS has its own set of aggregate limits that apply to the workgroup involved with jobs. Each QOS is usable only on those partitions that explicitly allow it.

Each allocation group (workgroup) allocation has access to certain partitions (CPU and/or GPU) and is charged based on the compute node resources used against the allocation request. See Job Accounting on DARWIN for complete details regarding available resources and usage.

One can determine the nodes available in a particular partition by using the scontrol command. The command to check the nodes for the standard partition would look like this show Nodes=r1n[00-47]] which means nodes r1n00, r1n01, … , r1n47 are available in the standard partition.

$ scontrol show partition standard
PartitionName=standard
   AllowGroups=ALL AllowAccounts=ALL AllowQos=wg-cpu-1001,wg-cpu-1002
   AllocNodes=ALL Default=YES QoS=part-standard
   DefaultTime=00:30:00 DisableRootJobs=NO ExclusiveUser=NO GraceTime=0 Hidden=NO
   MaxNodes=UNLIMITED MaxTime=2-00:00:00 MinNodes=0 LLN=NO MaxCPUsPerNode=UNLIMITED
   Nodes=r1n[00-47]
   PriorityJobFactor=32768 PriorityTier=32768 RootOnly=NO ReqResv=NO OverSubscribe=NO
   OverTimeLimit=NONE PreemptMode=OFF
   State=UP TotalCPUs=3072 TotalNodes=48 SelectTypeParameters=NONE
   JobDefaults=(null)
   DefMemPerNode=UNLIMITED MaxMemPerNode=UNLIMITED
   TRESBillingWeights=cpu=1.0,mem=0.125G

Now to check the usable memory on each accessible node of a given investing-entity (workgroup), please use the qhost command.

sworkgroup and sproject

One can use sworkgroup (UD IT customized utility) command to see the limits for the idle partition only. All other available resources for an allocation are tracked via the sproject command (UD IT customized utility). See Job Accounting on DARWIN for complete details regarding available resources and usage for your allocation.

[(it_css:traine)@login01 ~]$ sworkgroup --help
usage: sworkgroup [-h] [--workgroup group] [--limits] [--noheader]
                  [--human-readable] [--parseable]
 
Show Slurm partitions and limits for a workgroup
 
optional arguments:
  -h, --help            show this help message and exit
  --workgroup group, -g group
                        display partitions available to a specific workgroup
  --limits, -l          show TRES limits for each partition
  --noheader, -N        do not display column headers on output
  --human-readable, -H  show TRES in a more human-readable format
  --parseable           show as parseable lines rather than a tabular format

[(it_css:anita)@login01.darwin ~]$ sworkgroup -g it_css --limits 
Partition Per user Per job Per workgroup
---------+--------+-------+-------------
idle      cpu=640
[(it_css:anita)@login01.darwin ~]$ sworkgroup -g it_css --human-readable --limits
Partition Per user  Per job Per workgroup
---------+---------+-------+-------------
idle      640 cores

Host Status, Literally (sinfo or ssinfo)

Slurm also features a sinfo command that can be used to report status of all the nodes of cluster, period, no job information or information will be displayed. The ssinfo version is a UD IT version of sinfo that provides output as an interactive spreadsheet using the curses display library such that each column in the spreadsheet is sized to match the longest string present, rather than truncating at a fixed width as the Slurm commands do. The same options for sinfo work with sssinfo too.

[(it_css:traine)@login00.darwin ~]$ sinfo --long --Node
Mon Apr 26 11:38:42 2021
NODELIST   NODES    PARTITION       STATE CPUS    S:C:T MEMORY TMP_DISK WEIGHT AVAIL_FE REASON
r1n00          1    standard*        idle 64     2:32:1 499712  1800000      1 standard none
r1n01          1    standard*        idle 64     2:32:1 499712  1800000      1 standard none
r1n02          1    standard*        idle 64     2:32:1 499712  1800000      1 standard none
.
.
.
r1n46          1    standard*        idle 64     2:32:1 499712  1800000      1 standard none
r1n47          1    standard*        idle 64     2:32:1 499712  1800000      1 standard none
r1t00          1       gpu-t4        idle 64     2:32:1 491520  1800000   1000 nvidia-g none
r1t01          1       gpu-t4        idle 64     2:32:1 491520  1800000   1000 nvidia-g none
r1t02          1       gpu-t4        idle 64     2:32:1 491520  1800000   1000 nvidia-g none
.
.
.
r1t06          1       gpu-t4        idle 64     2:32:1 491520  1800000   1000 nvidia-g none
r1t07          1       gpu-t4        idle 64     2:32:1 491520  1800000   1000 nvidia-g none
r2e00          1 extended-mem        idle 64     2:32:1 999424  1800000 100000 Optane-N none
r2l00          1    large-mem        idle 64     2:32:1 999424  1800000     10 large-me none
r2l01          1    large-mem        idle 64     2:32:1 999424  1800000     10 large-me none
.
.
.
r2l31          1    large-mem        idle 64     2:32:1 999424  1800000     10 large-me none
r2m00          1     gpu-mi50        idle 64     2:32:1 491520  1800000 100000 amd-gpu, none
r2t08          1       gpu-t4        idle 64     2:32:1 491520  1800000   1000 nvidia-g none
r2v00          1     gpu-v100        idle 48     2:24:1 737280  1800000  10000 nvidia-g none
r2v01          1     gpu-v100        idle 48     2:24:1 737280  1800000  10000 nvidia-g none
r2v02          1     gpu-v100        idle 48     2:24:1 737280  1800000  10000 nvidia-g none
r2x00          1   xlarge-mem        idle 64     2:32:1 203161  1800000    100 xlarge-m none
r2x01          1   xlarge-mem        idle 64     2:32:1 203161  1800000    100 xlarge-m none
r2x02          1   xlarge-mem        idle 64     2:32:1 203161  1800000    100 xlarge-m none
.
.
.
r2x09          1   xlarge-mem        idle 64     2:32:1 203161  1800000    100 xlarge-m none
r2x10          1   xlarge-mem        idle 64     2:32:1 203161  1800000    100 xlarge-m none

The command has other options that will be discussed elsewhere; view the sinfo man page for a description of all options available.

Memory refers to the size of memory for each node in MB.The column S:C:T refers to the extended processor information: number of sockets, cores, threads (S:C:T) per node, whereas TMP_DISK refers to the size of temporary disk space per node in megabytes. WEIGHT column refers to scheduling weight of the nodes.

To fetch the information of all the nodes for a particular partition, use the -p argument followed the partition. Here is the output from ssinfo –long –Node -p large-mem to see the full text especially in the AVAIL_FEATURES column

┌─────────────────────────────────────────────────────────────────────────┐
│Wed Apr 19 11:18:52 2023                                                 │
│────────────────────────                                                 │
│NODELIST                 │ NODES │ PARTITION │ STATE     │ CPUS │ S:C:T  │
│r2l00                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l01                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l02                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l03                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l04                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l05                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l06                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l07                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l08                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l09                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l10                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l11                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l12                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l13                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l14                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l15                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l16                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│r2l17                    │ 1     │ large-mem │ allocated │ 64   │ 2:32:1 │
│─────────────────────────────────────────────────────────────────────────│
│ [Q]uit   [P]rev/[N]ext page   Page [L]eft/[R]ight   [E]nd/[B]eginning of│
└─────────────────────────────────────────────────────────────────────────┘

and pressing R to move right to see more, you see the following

┌─────────────────────────────────────────────────────────────────────────┐
│                                                                         │
│                                                                         │
│ S:C:T  │ MEMORY │ TMP_DISK │ WEIGHT │ AVAIL_FEATURES            │ REASON│
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│ 2:32:1 │ 999424 │ 1800000  │ 10     │ large-memory,1024GiB,1TiB │ none  │
│─────────────────────────────────────────────────────────────────────────│
│ [Q]uit   [P]rev/[N]ext page   Page [L]eft/[R]ight   [E]nd/[B]eginning of│
└─────────────────────────────────────────────────────────────────────────┘

UD IT Status Commands

UD IT has created additional status summary commands that build on the squeue and sacct commands provided by Slurm. In most cases these commands are conveniences that fill-in some of the options that were summarized above for the user. All commands display a terse summary of their options if the -h or –help option is provided with the command.

These commands are not part of the Slurm job scheduling software. They are made available on clusters maintained by UD IT that use the Slurm job scheduler and are not likely to be available on other clusters to which a user has access.

sjobs

The sjobs command displays job status in a more compact format:

[traine@login01 ~]$ sjobs -h
usage: sjobs [-h] [-a] [-G] [-g <workgroup>] [-u <username>] [-t] [-H]
             [--parseable] [-d <string>]
 
Display information for running and queued jobs
 
optional arguments:
  -h, --help            show this help message and exit
 
job selection options:
  -a, --all             show all users (not just yourself)
  -G, --current-group   show jobs for users in your current workgroup
  -g <workgroup>, --group <workgroup>
                        show jobs for users in the given workgroup (can be
                        used multiple times)
  -u <username>, --user <username>
                        show jobs for the given user (can be used multiple
                        times)
 
output options:
  -t, --totals          show total jobs, users, groups, cpus, nodes, and tasks
  -H, --no-header       do not show column headers
  --parseable           do not display in a tabular format
  -d <string>, --delimiter <string>
                        when --parseable is selected, separate columns with
                        this string
 
If no job selection options are provided then jobs for the current user will
be displayed.

[(it_css:traine)@login01.darwin ~]$ sjobs
JOBID USER      STATE  JOBNAME    GROUP  NCPUS NNODES NTASKS
----- -------- ------- ---------- ------ ----- ------ ------
 1904 traine RUNNING arrayJob   it_css     1      1      1
 1903 traine RUNNING openmp_job it_css     8      1      1

By default, only jobs owned by the user issuing the command are displayed. The -g option followed by the group name displays jobs for users who are members of that specific group:

[traine@login01.darwin ~]$ sjobs -g arce
    JOBID USER   STATE  JOBNAME    GROUP NCPUS NNODES NTASKS
--------- ----- ------- ---------- ----- ----- ------ ------
 771536_6 xmdrm RUNNING STAC_train arce      4      1      1
 771536_8 xmdrm RUNNING STAC_train arce      4      1      1
 771536_9 xmdrm RUNNING STAC_train arce      4      1      1
771536_10 xmdrm RUNNING STAC_train arce      4      1      1

The -a option displays jobs for all cluster users.

qhost

qhost is a wrapper written for DARWIN to consolidate the information collected from Slurm commands and joined together to display the host/node information in a similar fashion to that of qhost on Farber.

[traine@login01 ~]$ qhost --help
usage: qhost [-help] [-h <hostlist>] [-ncb] [-j] [-u <username>] [--help]
             [--hosts <hostlist>] [--jobs] [--users <userlist>]
             [--no-node-sort] [--std-host-topo]
 
Display host information akin to Grid Engine qhost
 
original options:
  options inherited from Grid Engine qhost
 
  -help                print this help
  -h <hostlist>        display only selected hosts
  -ncb                 suppress host topology based information
  -j                   display jobs running on the host(s)
  -u <username>        show only jobs for user
 
extended options:
  additional options not present in Grid Engine qhost
 
  --help               alternate to -help
  --hosts <hostlist>   alternate to -h
  --jobs               alternate to -j
  --users <userlist>   alternate to -u
  --no-node-sort, -S   do not sort the list of nodes by name before displaying
  --std-host-topo, -T  show host topology as sockets, cores-per-socket, and
                       threads-per-core
 
A <hostlist> is one or more node names or name patterns (e.g. r00n[00-24])
separated by commas. A <userlist> is one or more user names separated by
commas

The following example shows the specifications for node r1n03 and this is helpful to understand that the actual usable memory on r1n03 is 480G versus the physical memory which is 488G. This is important when specifying the amount of memory as a option especially when using a workgroup partition. See details about managing jobs using command options for sbatch: Memory

[(it_css:traine)@login01.darwin ~]$ qhost -h r1n03
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR NLOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
r1n03                   standard       64    2   64   64  0.63  488.0G  480.0G     0.0     0.0

Adding option -j will now show the jobs running on node r1n03 in this example.

[(it_css:traine)@login01.darwin ~]$ qhost -h r1n03 -j
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR NLOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
r1n03                   standard       64    2   64   64  0.63  488.0G  480.0G     0.0     0.0
        job-ID        prior name             user     state submit/start at        queue        master   ja-task-ID
  -----------------------------------------------------------------------------------------------------------------
        771529    176093110 conf             msafrono   R   2021-09-28T16:00:57    idle         SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
        771455     17179808 27476_SAPT       allinson   R   2021-09-28T06:52:57    idle         MASTER
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE
                                                                                                SLAVE

qstatgrp

qstatgrp makes use of the Slurm commands squeue and sacctmgr to display the current utilization of resources within a workgroup. Using the -h or –help argument will display the list of operations and options that can be used with qstatgrp.

[(it_css:traine)@login00 ~]$ qstatgrp --help
usage: qstatgrp [-h] [-g <gid>] [-o] [-j]
 
Display per-workgroup job information akin to Grid Engine qstatgrp
 
optional arguments:
  -h, --help            show this help message and exit
  -g <gid>, --group <gid>
                        restrict to jobs running under the given workgroup
                        (can be specified multiple times)
  -o, --group-only      display for the workgroup partition only
  -j, --jobs            show all jobs, not an aggregate summary of them

With no options specified qstatgrp will list a summary of current utilization of resources for the current workgroup. To see the current utilization for a particular workgroup use the -g option.

[(it_css:traine)@login01.darwin ~]$ qstatgrp -g jayaraman_lab
PARTITION                      NODES        CPUS     MAX MEM    MAX CPUS
------------------------------------------------------------------------
idle                              20         650
- TOTAL                           20         650

And to see the individual jobs and resources being used for the default workgroup, use the -j option or add the workgroup to see for a particular workgroup.

[(it_css:traine)@login01.darwin ~]$ qstatgrp -g jayaraman_lab -j
PARTITION               JOBID                   OWNER       PRIORITY STATE       NODES        CPUS         MEM                    GRES
--------------------------------------------------------------------------------------------------------------------------------------
idle                    771468                  heil        64424266    PD           1           1          1G                     N/A
idle                    771414                  heil        64424266    PD           1           1          1G                     N/A
idle                    771412                  heil        64424266    PD           1           1          1G                     N/A
idle                    771134                  heil        64424266    PD           1           1          1G                     N/A
idle                    771127                  heil        64424266    PD           1           1          1G                     N/A
idle                    771123                  heil        64424266    PD           1           1          1G                     N/A
idle                    771116                  heil        64424266    PD           1           1          1G                     N/A
idle                    771059                  heil        64424266    PD           1           1          1G                     N/A
idle                    771051                  heil        64424266    PD           1           1          1G                     N/A
idle                    770960                  heil        64424266    PD           1           1          1G                     N/A
idle                    771467                  heil        64424272     R           1          64          3G                     N/A
idle                    771413                  heil        42949518     R           1          64          3G                     N/A
idle                    771411                  heil        42949518     R           1          64          3G                     N/A
idle                    771133                  heil        42949518     R           1          64          3G                     N/A
idle                    771126                  heil        42949518     R           1          64          3G                     N/A
idle                    771122                  heil        42949518     R           1          64          3G                     N/A
idle                    771115                  heil        42949518     R           1          64          3G                     N/A
idle                    771058                  heil        38654567     R           1          64          3G                     N/A
idle                    771050                  heil        38654567     R           1          64          3G                     N/A
idle                    770959                  heil        64424272     R           1          64          3G                     N/A

spreempted

Due to a certain level of inconsistency in Slurm's error logging, preempted jobs are notified as FAILED due to improper handling of SIGCONT/SIGTERM signals which leads to default behavior of SIGTERM that makes jobs to immediately exit rather than waiting for a grace period of 5 minutes (technically should happen when job is preempted). Therefore to really determine if a job has been preempted or not, UD IT has developed a command which tests various conditions and concludes on preemption.

Also please note that, preemption occurs only to the jobs are being submitted to the idle partition.

[traine@login01.darwin ~]$ spreempted -h
usage: spreempted [-h] [--verbose] [--quiet] [--show-jobid]
                  [--sort-properties] [--jobid <job-id>{,<job-id>..}]
 
Determine if jobs were preempted
 
optional arguments:
  -h, --help            show this help message and exit
  --verbose, -v         Emit additional information
  --quiet, -q           Do not summarize preemptions, just return non-zero
                        result code if any were preempted
  --show-jobid, -s      Always prefix output lines with job id
  --sort-properties     For verbose output, show properties in alphabetized
                        order
  --jobid <job-id>{,<job-id>..}, -j <job-id>{,<job-id>..}
                        Slurm job id to check; can be used multiple times to
                        check more than one job. For array jobs, use the
                        syntax #_# for individual array indices

[traine@login01.darwin ~]$ spreempted -j 410289
preempted, did not reach grace period limit

Table of Contents