Viewing Job Status Information on Mills
Once a user has been able to submit jobs to the queue – interactive or batch – the user will from time to time want to know what those jobs are doing. Is the job waiting in a queue for resources to become available, or is it executing? How long has the job been executing? How much CPU time or memory has the job consumed? Users can query Grid Engine for job information using the qstat
command. The qstat
command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document. Please see the qstat
man page for a detailed description of all options.
With no options provided, qstat
defaults to displaying a list of all incomplete jobs submitted by the user. This includes jobs that are waiting in a queue, jobs that are executing, and jobs that are in an error state. The list is presented in a tabular format, with the following columns:
Column | Description |
---|---|
job-ID | Numerical identifier assigned when the job was submitted |
prior | Scheduling priority of the job; a real number between 0 (low) and 1 (high) |
name | The name assigned to the job, e.g. with the -N option to qsub ; usually abbreviated to ~ 10 characters |
user | The owner of the job |
state | Current state of the job (see next table) |
submit/start at | Either the time the job was submitted or the time the job began execution, depending on its state |
queue | The primary queue instance to which the (executing) job has been assigned |
slots | The number of processing cores assigned to the job; see Jobs with Parallelism for more information |
ja-task-ID | The secondary identifier for the job; see Array Jobs for more information |
The different states in which a job may exist are enumerated by the following codes:
State Code | Description |
---|---|
qw | Job is queued and waiting to execute |
t | Job is ready to execute and is transferring to its assigned node. Jobs usually go from t to r quickly, but very large parallel jobs may persist in t for a short time. |
r | Job is executing (running) |
Eqw | An error occurred when Grid Engine attempted to schedule the job, so it has been returned to the qw state |
s | Job has been suspended so that a higher-priority job can preempt it and use its resources. |
d | Job has completed and is being deleted from its queue. |
h | Displayed when a hold has been placed on a job, such that other jobs must complete before it can begin. See Using Job Holds for more information. |
The qstat
command also allows the user to see job status information for any other cluster user by means of the -u
flag. The flag requires a single argument: a username or the wildcard character (\*
):
[(it_css:traine)@mills it_css]$ qstat -u traine : [(it_css:traine)@mills it_css]$ qstat -u \* :
Specifying the wildcard argument displays job status for all cluster users.
Viewing Status by Hosts
In all forms discussed above the output from qstat
focuses on jobs. To instead view the status information in a host-centric format, the -f
option should be added to the qstat
command. The output from qstat -f
is organized by queue instances (thus, also by compute hosts) with jobs running in a particular queue instance summarized therein:
[(it_css:traine)@mills it_css]$ qstat -f -q 'it_css*' queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- it_css-qrsh.q@n015 IP 0/0/24 23.76 lx24-amd64 d --------------------------------------------------------------------------------- it_css-qrsh.q@n016 IP 0/1/24 4.90 lx24-amd64 71882 0.56283 QLOGIN traine r 09/21/2012 10:01:14 1 --------------------------------------------------------------------------------- : --------------------------------------------------------------------------------- it_css.q+@n015 BP 0/24/24 23.76 lx24-amd64 dP 71882 0.56283 RhDimer1 traine r 08/28/2012 11:32:19 24 --------------------------------------------------------------------------------- it_css.q+@n016 BP 0/4/24 4.90 lx24-amd64 71882 0.56283 RhDimer1 traine r 08/28/2012 11:32:19 4 --------------------------------------------------------------------------------- :
-q
option, the command would have displayed information for every queue instance (and there are many queue instances). The argument to -q
is a queue or queue instance name, and may contain wildcard (*) characters as demonstrated here: it_css*
= all queue instances whose name starts with it_css
. The -ne
option is also useful in this regard: it filters the qstat -f
output to only those queue instances with jobs that are executing.
As with the job-centric display method, the -u
flag can be used to affect what user(s) jobs are displayed. One helpful feature of displaying jobs in host-centric fashion is that parallel jobs that span multiple hosts will be displayed under each queue instance in which they run: the "RhDimer1" job in the example above uses 28 slots across the n015
and n016
hosts.
The host-centric view displays per-job information similar to that which the job-centric view provided. In addition, it displays information about the queue instances and the host associated with the queue instance. Each Grid Engine queue restricts what kinds of jobs it will accept, and this is summarized under the "qtype" (or queue type) heading. The common queue types are:
QStat Letter | Description |
---|---|
B | Batch jobs (via qsub ) can run in this queue |
I | Interactive jobs (via qlogin ) can run in this queue |
P | Jobs which use a Parallel environment can run in this queue |
Following the queue type is a summary of the slot usage for the queue instance. The third integer is the total number of slots and the second is the number of slots currently in-use; the number of free slots is just the total minus the in-use count. The real number that follows the slot summary is the load on the host associated with the queue instance. The load is calculated by Grid Engine using a formula that can include not only the host's Unix load average but usage level thereon of any resource of which Grid Engine is aware (memory, disk).
N/A
" is probably due to the host's being offline (shutdown, crashed, etc.). UD IT makes every effort to reboot offline hosts as quickly as possible or repair the system if a hardware fault is the reason for its being offline.
The remaining columns show the processor architecture for the host and the state of the queue instance. Queue states are indicated by a series of letters (just like job states) and the absence of any letters implies the queue is online and without problems.
QStat Letter | State Description |
---|---|
d | The queue instance has been disabled by a system administrator and will accept no new jobs. |
a | The load has exceeded a threshold, producing an alarm on the queue instance; no new jobs will be scheduled until the alarm condition changes. |
u | The host cannot be contacted on the network, leaving it in an unknown state. |
P | The queue instance is subordinate to another queue and jobs running in it may be preempted by jobs entering its superior queue. |
Recall that a load of N/A
usually indicates that a host is offline; this is confirmed by the queue instance's also having a state of au
.
UD IT disables a queue instance (state d
) if the host associated with it requires maintenance: a reboot to apply new software features, the replacement of a failing memory module, etc.
Host Status, Literally
Grid Engine also features a qhost
command that can be used to report status of a host, period: no job information or queue instance information will be displayed:
[(it_css:traine)@mills it_css]$ qhost -h n013 -h n014 HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - n013 lx24-amd64 24 24.10 63.0G 2.2G 126.1G 689.5M n014 lx24-amd64 24 24.09 63.0G 2.3G 126.1G 0.0
Without any -h
options the qhost
command displays information for every host. The command has other options that will be discussed elsewhere; view the qhost
man page for a description of all options available.
Grid Engine strives to keep the load (LOAD
) less than or equal to the core count (NCPU
) on a host. If the load is significantly greater than the number of cores, then the jobs running on that node will likely not be running at optimum efficiency.
NCPU
is the result of extensive use of swap space (SWAPUS
). This is usually the hallmark of jobs that are allocating more memory than the host has physically available to it. Swapping moves data back and forth between memory and hard disk in order to make the host appear to have more memory at the expense of speed and efficiency. If your jobs are triggering extensive use of swap space, you may need to analyze how your program uses memory and modify the job to better match the resources available.
Full Status for a Job
So far the per-job output has been limited to a name, submit/run time, and some state flags. Grid Engine maintains a far more extensive set of parameters for each job which can be viewed using the -j
«job_id
» option:
[(it_css:traine)@mills it_css]$ qstat -j 82518 ============================================================== job_number: 82518 exec_file: job_scripts/82518 submission_time: Mon Oct 1 10:17:34 2012 owner: traine uid: 1201 group: it_css gid: 1002 sge_o_home: /home/1201 sge_o_log_name: traine sge_o_path: /home/1201/bin:/opt/sbin:/usr/sbin:/sbin:/home/1201/bin:/opt/sbin:/usr/sbin:/sbin:/usr/lib64/qt-3.3/bin:/opt/bin:/home/software/GridEngine/6.2u7/bin/lx24-amd64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin sge_o_shell: /bin/bash sge_o_workdir: /lustre/work/it_css sge_o_host: mills account: sge cwd: /lustre/work/it_css merge: y hard resource_list: idle_resources=0,dev_resources=0,exclusive=1,standby_resources=1,scratch_free=1000000 mail_list: traine@mills.hpc.udel.edu notify: FALSE job_name: mpibounce.qs priority: -1023 jobshare: 0 env_list: script_file: mpibounce.qs parallel environment: openmpi range: 48 verify_suitable_queues: 1 usage 1: cpu=13:48:45, mem=13134.08773 GBs, io=0.07543, vmem=12.830G, maxvmem=12.830G scheduling info: (Collecting of scheduler job information is turned off)
For jobs which are actively executing, the usage
line displays the accumulated CPU time and memory usage. Note the unit of GBs
on the memory usage: just as electricity usage is measured in kilowatt-hours, memory consumption in Grid Engine is a sum over time. Some clusters may have strict accounting that limits users' total CPU or memory usage or even bills for it. If peak instantaneous memory usage (the vmem
and maxvmem
properties shown) were the billable quantity, then a program that uses 16 GB of memory for a few seconds would be treated the same as a program that uses 16 GB of memory for five days!
Status in XML Format
For all examples cited above the qstat
command displays its output in formats that are human readable. Often a human-readable format is difficult for a computer program to understand. Suppose a cluster user found none of the formats summarized above to be to his or her taste. That user could write a program (a Perl or Python script, a C program, etc.) that consumes the output from qstat
and transforms it to his/her preference. The qstat
command aids in this venture by being able to display any of its output in XML rather than human-readable format. Adding the -xml
option to any qstat
command enables display in XML format. The XML document structures exported by qstat
are outside the scope of this documentation; consult the qstat
man page for more details.
UD IT Status Commands
UD IT has created additional status summary commands that build on the qstat
and qhost
commands provided by Grid Engine. In most cases these commands are conveniences that fill-in some of the options that were summarized above for the user. All commands display a terse summary of their options if the -h
or –help
option is provided with the command.
qjobs
The qjobs
command displays job status in a more compact format:
[(it_css:traine)@mills ~]$ qjobs =============================================================================== JobID Owner State Submitted as =============================================================================== 12568 traine running mpibounce.qs 12584 traine running QLOGIN =============================================================================== 2 jobs total.
By default only jobs owned by the user issuing the command are displayed. The -g
option displays all jobs for the user's current group; additionally providing a group name with -g
displays jobs for users who are members of that specific group:
[(it_css:traine)@mills ~]$ qjobs -g sandler_thermo =============================================================================== JobID Owner State Submitted as =============================================================================== 80270 odmitr running g09ex 82518 frey running job1 82524 frey running job2 =============================================================================== 3 jobs total.
The -a
option displays jobs for all cluster users.
Nodes Owned by User
Given a user's current group, the qnodes
command displays a list of hosts owned by that group. Under the queueing policies adopted by UD IT, users in a group are guaranteed a high level of access to those nodes.
Per-Group QStat and QHost
The qstat
and qhost
commands can be restricted to only those hosts owned by the user's current group (see qnodes
above). The qstatgrp
and qhostgrp
commands accept the same options as qstat
and qhost
, respectively, but limit the display to the list of owned nodes (or queues associated with those nodes).
The qstatgrp
command by default summarizes usage of all queues to which the user has access given his/her current working group. Adding the -j
flag summarizes the jobs executing in those queues rather than summarizing the queues themselves.
The qhostgrp
command by default summarizes usage of all hosts to which the user has access given his/her current working group. Adding the -j
flag summarizes the jobs (including standby) executing on those hosts rather than summarizing the hosts themselves.
Both qstatgrp
and qhostgrp
accept a -g
«group name
» option to limit to an arbitrary group (and not just the user's current working group).
Resource-management options
Any large cluster will have many nodes with perhaps differing resources, e.g., cores, memory, disk space and accelerators. The ones you can request come in three categories.
- Fixed resources by the configuration - slots and installed memory,
- Set by load sensor - CPU load averages, memory usage
- Managed by job scheduler internal bookkeeping to ensure availability - available memory and floating software licenses.
Details by cluster
Managing Jobs
Checking job status
Use the qstat command to check the status of queued jobs. Use the qstat -h
or man qstat
commands on the login node to view a complete description of available options. Some of the most often-used options are summarized here:
Option | Result |
---|---|
-j «job_id_list» | Displays information for specified job(s) |
-u «user_list» | Displays information for jobs associated with the specified user(s) |
-ext | Displays extended information about jobs |
-t | Shows additional information about subtasks |
-r | Shows resource requirements of jobs |
For example, to list the information for job 62900, type
qstat -j 62900
To list a table of jobs assigned to user traine that displays the resource requirements for each job, type
qstat -u traine -r
With no options qstat defaults to qstat -u $USER
, so you get a table for your jobs. With the -u
option the
qstat command uses Reduced Format with following columns.
Column header | Description |
---|---|
job-ID | job id assigned to the job |
user | user who owns the job |
name | job name |
state | current job status, including qw(aiting) , s(uspended), r(unning), h(old), E(rror), d(eletion) |
submit/start at | submit time (waiting jobs) or start time (running jobs) |
queue | name of the queue the job is assigned to (for running or suspended jobs only) |
slots | number of slots assigned to the job |
A more concise listing
The IT-supplied qjobs command provides a more convenient listing of job status.
Command | Description |
---|---|
qjobs | Displays the status of jobs submitted by you |
qjobs -g | Displays the status of jobs submitted by your research group |
qjobs –g «investing_entity» | Displays the status of jobs submitted by members of the named investing-entity |
qjobs –a | Displays the status of jobs submitted by all users |
In all cases the JobID, Owner, State and Name are listed in a table.
Job status is qw
When your job status is qw
it means your job is queued and waiting to execute. When you check with qstat
you might see something like this
[(it_css:traine)@mills it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 99154 0.50661 openmpi-pg traine qw 11/12/2012 14:33:49 144
Sometimes your job is stuck and remains in the qw
state and never starts running. You can use qalter to poke at the job scheduler to see why your job is not running. For example, to see the last 10 lines of the job scheduler validation for job 99154, you can type
[(it_css:traine)@mills it_css]$ qalter -w p 99154 | tail -10 Job 99154 has no permission for cluster queue "puleo-qrsh.q" Job 99154 has no permission for cluster queue "capsl.q+" Job 99154 has no permission for cluster queue "spare.q" Job 99154 has no permission for cluster queue "it_nss-qrsh.q" Job 99154 has no permission for cluster queue "it_nss.q" Job 99154 has no permission for cluster queue "it_nss.q+" Job 99154 Jobs cannot run because only 72 of 144 requested slots are available Job 99154 Jobs can not run in PE "openmpi" because the resource requirements can not be satified verification: no suitable queues
In this example, we asked for 144 slots, but only 72 slots are available for workgroup it_css
nodes.
[(it_css:traine)@mills it_css]$ qstatgrp CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDPS cdsuE it_css-dev.q 0.00 0 0 72 72 0 0 it_css-qrsh.q 0.00 0 0 72 72 0 0 it_css.q 0.00 0 0 72 72 0 0 it_css.q+ 0.00 0 0 72 72 0 0 standby-4h.q 0.27 0 0 4968 5064 0 96 standby.q 0.27 12 0 4932 5064 0 120
Use qalter to change the attributes of the pending job such as reducing the number of slots requested to be within the workgroup it_css
nodes or change the resources specified to the standby queue so the job could run. For example, let's change the number of slots requested to 48 instead of 144 by using
[(it_css:traine)@mills it_css]$ qalter -pe openmpi 48 99154 modified parallel environment of job 99154 modified slot range of job 99154 [(it_css:traine)@mills it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 99154 0.50661 openmpi-pg traine r 11/12/2012 14:33:49 48
Another way to get this job running would be to change the resource for the job to run in the standby queue. To do this you must specify all resources since qalter
completely replaces any parameters previously specified for the job by that option. In this example, we alter the job to run in the standby queue by using
[(it_css:traine)@mills it_css]$ qalter -l idle=0,standby=1 99154 modified hard resource list of job 99154 [(it_css:traine)@mills it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 99154 0.50661 openmpi-pg traine r 11/12/2012 15:23:52 standby.q@n016 144
qalter
can only be used to alter jobs that you own!
Job status is Eqw
When your job status is Eqw
it means an error occurred when Grid Engine attempted to schedule the job, so it has been returned to the qw state. When you check with qstat
you might see something like this for user traine
[(it_css:traine)@mills it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 686924 0.50509 openmpi-pg traine Eqw 08/12/2014 19:38:53 1
If the state shows Eqw
, then use qstat -j job_id | grep error
to check for the error. Here is an example of what you might see
[traine@mills ~]$ qstat -j 686924 | grep error error reason 1: 08/12/2014 22:08:27 [1208:60529]: error: can't chdir to /archive/it_css/traine/ex-openmpi: No such file or directory
This error indicates that some directory or file (respectively) cannot be found. Verify that the file or directory in question exists, i.e., you haven't forgotten to create it and you can see it from the head node and compute node. If it appears to be okay, then the job may have suffered a transient condition such as a failed NFS automount, the NFS server was temporarily down, or some other filesystem error occurred.
If you understand the reason and can get it fixed, use qmod -cj job_id
to clear the error state like this:
[traine@mills ~]$ qmod -cj 686924
and it should eventually run.
Checking queue status
The qstat command can also be used to get status of all queues on the system.
Option | Result |
---|---|
-f | Displays summary information for all queues |
-ne | Suppresses the display of empty queues. |
-qs {a|c|d|o!|s|u|A|C|D|E|S} | Selects queues to be displayed according to state |
With the -f
option, qstat uses full format, which includes the following columns.
Column header | Description |
---|---|
queuename | job id assigned to the job |
resv/used/total | Number of slots reserved/used/total |
states | current job status, including a(larm), s(uspended), d(isabled), h(old), E(rror), P(reempted) |
Examples:
List all queues that are unavailable because they are disabled or the slotwise preemption limits have been reached.
qstat -f -qs dP
List the queues associated with the investing entity it_css.
qstat -f | egrep '(queuename|it_css)'
Checking overall queue and node information
You can determine overall queue and node information using the qstatgrp
, qconf
, qnodes
and qhostgrp
commands. Use a command's -h
option to see its command syntax. To obtain information about a group other than your current group, use the -g
option.
Command | Illustrative example |
---|---|
qstatgrp | qstatgrp shows a summary of the status of the owner-group queues of your current workgroup. |
qstatgrp -j | qstatgrp -j shows the status of each job in the owner-group queues that membersof your current workgroup submitted. |
qstatgrp -g «investing_entity» | qstatgrp -g it_css shows the status of all the owner-group queues for theit_css investing-entity. |
qstatgrp -j -g «investing_entity» | qstatgrp –j -g it_css shows the status of each job in the owner-group queues thatmembers of the it_css investing-entity submitted. |
qconf -sql | Shows all queues as a list. |
qconf -sq «queue_name»* | qconf -sq it_css* displays the configuration of each owner-group queues for theit_css investing-entity. |
qnodes | qnodes displays the names of your owner-group's nodes |
qnodes -g «investing_entity» | qnodes -g it_css displays the name of the nodes owned by theit_css investing-entity. |
qhostgrp | qhostgrp displays the current status of your owner-group's nodes |
qhostgrp –g «investing_entity» | qhostgrp -g it_css displays the current status of the nodes owned by theit_css investing-entity. |
qhostgrp -j -g «investing_entity» | qhostgrp –j -g it_css shows all jobs running (including standby and spillover) in the owner-group nodes for the it_css investing-entity. |
Checking overall usage of resource quotas
Resource quotas are used to help control the standby and spillover queues. Each user has a quota based on the limits set by the standby queue specifications for each cluster, and each workgroup has a per_workgroup quota based on the number of slots purchased by the research group.
Command | Illustrative example |
---|---|
qquota -u «username» | grep standby | qquota -g traine | grep standby displays the current usage of slots by usertraine in the standby resources. |
qquota -u \* | grep «investing_entity» | qquota -u \* | grep it_css displays the current usage of slots being used by allmembers of the it_css investing-entity, the per_workgroup quota. |
The example below gives a snapshot of slots being used by traine
user in the standby queues and the slots being used by all members of the workgroup it_css
$ qquota -u traine | grep standby standby_limits/4h slots=80/800 users traine queues standby-4h.q standby_cumulative/default slots=80/800 users traine queues standby.q,standby-4h.q $ qquota -u \* | grep it_css per_workgroup/it_css slots=141/200 users @it_css queues it_css.q,spillover.q
Deleting a job
Use the qdel «job_id» command to remove pending and running jobs from the queue.
For example, to delete job 28000
qdel 28000
If you have a job that remains in a delete state, even after you try to delete it with the qdel command, then try a force deletion with
qdel -f 28000
This will just forget about the job without attempting any cleanup on the node(s) being used.