Managing Jobs on Farber

Once a user has been able to submit jobs to the queue – interactive or batch – the user will from time to time want to know what those jobs are doing. Is the job waiting in a queue for resources to become available, or is it executing? How long has the job been executing? How much CPU time or memory has the job consumed? Users can query Grid Engine for job information using the qstat command. The qstat command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document. Please see the qstat man page for a detailed description of all options.

With no options provided, qstat defaults to displaying a list of all incomplete jobs submitted by the user. This includes jobs that are waiting in a queue, jobs that are executing, and jobs that are in an error state. The list is presented in a tabular format, with the following columns:

Column	Description
job-ID	Numerical identifier assigned when the job was submitted
prior	Scheduling priority of the job; a real number between 0 (low) and 1 (high)
name	The name assigned to the job, e.g. with the `-N` option to `qsub`; usually abbreviated to ~ 10 characters
user	The owner of the job
state	Current state of the job (see next table)
submit/start at	Either the time the job was submitted or the time the job began execution, depending on its state
queue	The primary queue instance to which the (executing) job has been assigned
slots	The number of processing cores assigned to the job; see Jobs with Parallelism for more information
ja-task-ID	The secondary identifier for the job; see Array Jobs for more information

The different states in which a job may exist are enumerated by the following codes:

State Code	Description
`qw`	Job is queued and waiting to execute
`t`	Job is ready to execute and is transferring to its assigned node. Jobs usually go from `t` to `r` quickly, but very large parallel jobs may persist in `t` for a short time.
`r`	Job is executing (running)
`Eqw`	An error occurred when Grid Engine attempted to schedule the job, so it has been returned to the `qw` state
`s`	Job has been suspended so that a higher-priority job can preempt it and use its resources.
`d`	Job has completed and is being deleted from its queue.
`h`	Displayed when a hold has been placed on a job, such that other jobs must complete before it can begin. See Using Job Holds for more information.

The qstat command also allows the user to see job status information for any other cluster user by means of the -u flag. The flag requires a single argument: a username or the wildcard character (\*):

[(it_css:traine)@farber it_css]$ qstat -u traine
   :
[(it_css:traine)@farber it_css]$ qstat -u \*
   :

Specifying the wildcard argument displays job status for all cluster users.

In all forms discussed above the output from qstat focuses on jobs. To instead view the status information in a host-centric format, the -f option should be added to the qstat command. The output from qstat -f is organized by queue instances (thus, also by compute hosts) with jobs running in a particular queue instance summarized therein:

[(it_css:traine)@farber it_css]$ qstat -f -q 'it_css*'
queuename                      qtype resv/used/tot. load_avg arch          states
---------------------------------------------------------------------------------
it_css-qrsh.q@n015             IP    0/0/24        23.76     lx24-amd64    d  
---------------------------------------------------------------------------------
it_css-qrsh.q@n016             IP    0/1/24         4.90     lx24-amd64   
  71882 0.56283 QLOGIN     traine       r     09/21/2012 10:01:14    1 
---------------------------------------------------------------------------------
  :  
---------------------------------------------------------------------------------
it_css.q+@n015                 BP   0/24/24        23.76     lx24-amd64    dP
  71882 0.56283 RhDimer1   traine       r     08/28/2012 11:32:19    24
---------------------------------------------------------------------------------
it_css.q+@n016                 BP    0/4/24         4.90     lx24-amd64    
  71882 0.56283 RhDimer1   traine       r     08/28/2012 11:32:19     4
---------------------------------------------------------------------------------
  :

Without the -q option, the command would have displayed information for every queue instance (and there are many queue instances). The argument to -q is a queue or queue instance name, and may contain wildcard (*) characters as demonstrated here: it_css* = all queue instances whose name starts with it_css. The -ne option is also useful in this regard: it filters the qstat -f output to only those queue instances with jobs that are executing.

As with the job-centric display method, the -u flag can be used to affect what user(s) jobs are displayed. One helpful feature of displaying jobs in host-centric fashion is that parallel jobs that span multiple hosts will be displayed under each queue instance in which they run: the "RhDimer1" job in the example above uses 28 slots across the n015 and n016 hosts.

The host-centric view displays per-job information similar to that which the job-centric view provided. In addition, it displays information about the queue instances and the host associated with the queue instance. Each Grid Engine queue restricts what kinds of jobs it will accept, and this is summarized under the "qtype" (or queue type) heading. The common queue types are:

QStat Letter	Description
`B`	Batch jobs (via `qsub`) can run in this queue
`I`	Interactive jobs (via `qlogin`) can run in this queue
`P`	Jobs which use a Parallel environment can run in this queue

Following the queue type is a summary of the slot usage for the queue instance. The third integer is the total number of slots and the second is the number of slots currently in-use; the number of free slots is just the total minus the in-use count. The real number that follows the slot summary is the load on the host associated with the queue instance. The load is calculated by Grid Engine using a formula that can include not only the host's Unix load average but usage level thereon of any resource of which Grid Engine is aware (memory, disk).

A queue instance with a reported load of "N/A" is probably due to the host's being offline (shutdown, crashed, etc.). UD IT makes every effort to reboot offline hosts as quickly as possible or repair the system if a hardware fault is the reason for its being offline.

The remaining columns show the processor architecture for the host and the state of the queue instance. Queue states are indicated by a series of letters (just like job states) and the absence of any letters implies the queue is online and without problems.

QStat Letter	State Description
`d`	The queue instance has been disabled by a system administrator and will accept no new jobs.
`a`	The load has exceeded a threshold, producing an alarm on the queue instance; no new jobs will be scheduled until the alarm condition changes.
`u`	The host cannot be contacted on the network, leaving it in an unknown state.
`P`	The queue instance is subordinate to another queue and jobs running in it may be preempted by jobs entering its superior queue.

Recall that a load of N/A usually indicates that a host is offline; this is confirmed by the queue instance's also having a state of au.

UD IT disables a queue instance (state d) if the host associated with it requires maintenance: a reboot to apply new software features, the replacement of a failing memory module, etc.

Grid Engine also features a qhost command that can be used to report status of a host, period: no job information or queue instance information will be displayed:

[(it_css:traine)@farber it_css]$ qhost -h n013 -h n014
HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
-------------------------------------------------------------------------------
global                  -               -     -       -       -       -       -
n013                    lx24-amd64     24 24.10   63.0G    2.2G  126.1G  689.5M
n014                    lx24-amd64     24 24.09   63.0G    2.3G  126.1G     0.0

Without any -h options the qhost command displays information for every host. The command has other options that will be discussed elsewhere; view the qhost man page for a description of all options available.

Grid Engine strives to keep the load (LOAD) less than or equal to the core count (NCPU) on a host. If the load is significantly greater than the number of cores, then the jobs running on that node will likely not be running at optimum efficiency.

Quite often a load much higher than NCPU is the result of extensive use of swap space (SWAPUS). This is usually the hallmark of jobs that are allocating more memory than the host has physically available to it. Swapping moves data back and forth between memory and hard disk in order to make the host appear to have more memory at the expense of speed and efficiency. If your jobs are triggering extensive use of swap space, you may need to analyze how your program uses memory and modify the job to better match the resources available.

So far the per-job output has been limited to a name, submit/run time, and some state flags. Grid Engine maintains a far more extensive set of parameters for each job which can be viewed using the -j «job_id» option:

[(it_css:traine)@farber it_css]$ qstat -j 82518
==============================================================
job_number:                 82518
exec_file:                  job_scripts/82518
submission_time:            Mon Oct  1 10:17:34 2012
owner:                      traine
uid:                        1201
group:                      it_css
gid:                        1002
sge_o_home:                 /home/1201
sge_o_log_name:             traine
sge_o_path:                 /home/1201/bin:/opt/sbin:/usr/sbin:/sbin:/home/1201/bin:/opt/sbin:/usr/sbin:/sbin:/usr/lib64/qt-3.3/bin:/opt/bin:/home/software/GridEngine/6.2u7/bin/lx24-amd64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin
sge_o_shell:                /bin/bash
sge_o_workdir:              /lustre/work/it_css
sge_o_host:                 farber
account:                    sge
cwd:                        /lustre/work/it_css
merge:                      y
hard resource_list:         idle_resources=0,dev_resources=0,exclusive=1,standby_resources=1,scratch_free=1000000
mail_list:                  traine@farber.hpc.udel.edu
notify:                     FALSE
job_name:                   mpibounce.qs
priority:                   -1023
jobshare:                   0
env_list:                   
script_file:                mpibounce.qs
parallel environment:  openmpi range: 48
verify_suitable_queues:     1
usage    1:                 cpu=13:48:45, mem=13134.08773 GBs, io=0.07543, vmem=12.830G, maxvmem=12.830G
scheduling info:            (Collecting of scheduler job information is turned off)

For jobs which are actively executing, the usage line displays the accumulated CPU time and memory usage. Note the unit of GBs on the memory usage: just as electricity usage is measured in kilowatt-hours, memory consumption in Grid Engine is a sum over time. Some clusters may have strict accounting that limits users' total CPU or memory usage or even bills for it. If peak instantaneous memory usage (the vmem and maxvmem properties shown) were the billable quantity, then a program that uses 16 GB of memory for a few seconds would be treated the same as a program that uses 16 GB of memory for five days!

For all examples cited above the qstat command displays its output in formats that are human readable. Often a human-readable format is difficult for a computer program to understand. Suppose a cluster user found none of the formats summarized above to be to his or her taste. That user could write a program (a Perl or Python script, a C program, etc.) that consumes the output from qstat and transforms it to his/her preference. The qstat command aids in this venture by being able to display any of its output in XML rather than human-readable format. Adding the -xml option to any qstat command enables display in XML format. The XML document structures exported by qstat are outside the scope of this documentation; consult the qstat man page for more details.

UD IT has created additional status summary commands that build on the qstat and qhost commands provided by Grid Engine. In most cases these commands are conveniences that fill-in some of the options that were summarized above for the user. All commands display a terse summary of their options if the -h or –help option is provided with the command.

These commands are not part of the Grid Engine job scheduling software. They are made available on clusters maintained by UD IT that use the Grid Engine job scheduler and are not likely to be available on other clusters to which a user has access.

The qjobs command displays job status in a more compact format:

[(it_css:traine)@farber ~]$ qjobs
===============================================================================
JobID  Owner              State    Submitted as
===============================================================================
12568  traine             running  mpibounce.qs
12584  traine             running  QLOGIN
===============================================================================
2 jobs total.

By default only jobs owned by the user issuing the command are displayed. The -g option displays all jobs for the user's current group; additionally providing a group name with -g displays jobs for users who are members of that specific group:

[(it_css:traine)@farber ~]$ qjobs -g sandler_thermo
===============================================================================
JobID  Owner              State    Submitted as
===============================================================================
80270  odmitr             running  g09ex
82518  frey               running  job1
82524  frey               running  job2
===============================================================================
3 jobs total.

The -a option displays jobs for all cluster users.

Given a user's current group, the qnodes command displays a list of hosts owned by that group. Under the queueing policies adopted by UD IT, users in a group are guaranteed a high level of access to those nodes.

The qstat and qhost commands can be restricted to only those hosts owned by the user's current group (see qnodes above). The qstatgrp and qhostgrp commands accept the same options as qstat and qhost, respectively, but limit the display to the list of owned nodes (or queues associated with those nodes).

The qstatgrp command by default summarizes usage of all queues to which the user has access given his/her current working group. Adding the -j flag summarizes the jobs executing in those queues rather than summarizing the queues themselves.

The qhostgrp command by default summarizes usage of all hosts to which the user has access given his/her current working group. Adding the -j flag summarizes the jobs (including standby) executing on those hosts rather than summarizing the hosts themselves.

Both qstatgrp and qhostgrp accept a -g «group name» option to limit to an arbitrary group (and not just the user's current working group).

Any large cluster will have many nodes with perhaps differing resources, e.g., cores, memory, disk space and accelerators. The ones you can request come in three categories.

Fixed resources by the configuration - slots and installed memory,
Set by load sensor - CPU load averages, memory usage
Managed by job scheduler internal bookkeeping to ensure availability - available memory and floating software licenses.

Details by cluster

Farber

Use the qstat command to check the status of queued jobs. Use the qstat -h or man qstat commands on the login node to view a complete description of available options. Some of the most often-used options are summarized here:

Option	Result
`-j` «job_id_list»	Displays information for specified job(s)
`-u` «user_list»	Displays information for jobs associated with the specified user(s)
`-ext`	Displays extended information about jobs
`-t`	Shows additional information about subtasks
`-r`	Shows resource requirements of jobs

For example, to list the information for job 62900, type

qstat -j 62900

To list a table of jobs assigned to user traine that displays the resource requirements for each job, type

qstat -u traine -r

With no options qstat defaults to qstat -u $USER, so you get a table for your jobs. With the -u option the qstat command uses Reduced Format with following columns.

Column header	Description
`job-ID`	job id assigned to the job
`user`	user who owns the job
`name`	job name
`state`	current job status, including qw(aiting) , s(uspended), r(unning), h(old), E(rror), d(eletion)
`submit/start at`	submit time (waiting jobs) or start time (running jobs)
`queue`	name of the queue the job is assigned to (for running or suspended jobs only)
`slots`	number of slots assigned to the job

A more concise listing

The IT-supplied qjobs command provides a more convenient listing of job status.

Command	Description
`qjobs`	Displays the status of jobs submitted by you
`qjobs -g`	Displays the status of jobs submitted by your research group
`qjobs –g` «investing_entity»	Displays the status of jobs submitted by members of the named investing-entity
`qjobs –a`	Displays the status of jobs submitted by all users

In all cases the JobID, Owner, State and Name are listed in a table.

Job status is qw

When your job status is qw it means your job is queued and waiting to execute. When you check with qstat you might see something like this

[(it_css:traine)@farber it_css]$ qstat -u traine
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  99154 0.50661 openmpi-pg traine       qw    11/12/2012 14:33:49                                  144

Sometimes your job is stuck and remains in the qw state and never starts running. You can use qalter to poke at the job scheduler to see why your job is not running. For example, to see the last 10 lines of the job scheduler validation for job 99154, you can type

[(it_css:traine)@farber it_css]$ qalter -w p 99154 | tail -10
Job 99154 has no permission for cluster queue "puleo-qrsh.q"
Job 99154 has no permission for cluster queue "capsl.q+"
Job 99154 has no permission for cluster queue "spare.q"
Job 99154 has no permission for cluster queue "it_nss-qrsh.q"
Job 99154 has no permission for cluster queue "it_nss.q"
Job 99154 has no permission for cluster queue "it_nss.q+"
Job 99154 Jobs cannot run because only 72 of 144 requested slots are available
Job 99154 Jobs can not run in PE "openmpi" because the resource requirements can not be satified
verification: no suitable queues

In this example, we asked for 144 slots, but only 72 slots are available for workgroup it_css nodes.

[(it_css:traine)@farber it_css]$ qstatgrp
CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDPS  cdsuE
it_css-dev.q                      0.00      0      0     72     72      0      0
it_css-qrsh.q                     0.00      0      0     72     72      0      0
it_css.q                          0.00      0      0     72     72      0      0
it_css.q+                         0.00      0      0     72     72      0      0
standby-4h.q                      0.27      0      0   4968   5064      0     96
standby.q                         0.27     12      0   4932   5064      0    120

Use qalter to change the attributes of the pending job such as reducing the number of slots requested to be within the workgroup it_css nodes or change the resources specified to the standby queue so the job could run. For example, let's change the number of slots requested to 48 instead of 144 by using

[(it_css:traine)@farber it_css]$ qalter -pe openmpi 48 99154
modified parallel environment of job 99154
modified slot range of job 99154
[(it_css:traine)@farber it_css]$ qstat -u traine
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  99154 0.50661 openmpi-pg traine       r     11/12/2012 14:33:49                                  48

Another way to get this job running would be to change the resource for the job to run in the standby queue. To do this you must specify all resources since qalter completely replaces any parameters previously specified for the job by that option. In this example, we alter the job to run in the standby queue by using

[(it_css:traine)@farber it_css]$ qalter -l idle=0,standby=1 99154
modified hard resource list of job 99154
[(it_css:traine)@farber it_css]$ qstat -u traine
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
  99154 0.50661 openmpi-pg traine       r     11/12/2012 15:23:52 standby.q@n016                   144

qalter can only be used to alter jobs that you own!

Job status is Eqw

When your job status is Eqw it means an error occurred when Grid Engine attempted to schedule the job, so it has been returned to the qw state. When you check with qstat you might see something like this for user traine

[(it_css:traine)@farber it_css]$ qstat -u traine
job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
-----------------------------------------------------------------------------------------------------------------
 686924 0.50509 openmpi-pg traine       Eqw   08/12/2014 19:38:53                                                              1

If the state shows Eqw, then use qstat -j job_id | grep error to check for the error. Here is an example of what you might see

[traine@farber ~]$ qstat -j 686924 | grep error
error reason    1:          08/12/2014 22:08:27 [1208:60529]: error: can't chdir to /archive/it_css/traine/ex-openmpi: No such file or directory

This error indicates that some directory or file (respectively) cannot be found. Verify that the file or directory in question exists, i.e., you haven't forgotten to create it and you can see it from the head node and compute node. If it appears to be okay, then the job may have suffered a transient condition such as a failed NFS automount, the NFS server was temporarily down, or some other filesystem error occurred.

If you understand the reason and can get it fixed, use qmod -cj job_id to clear the error state like this:

[traine@farber ~]$ qmod -cj 686924

and it should eventually run.

The qstat command can also be used to get status of all queues on the system.

Option	Result
`-f`	Displays summary information for all queues
`-ne`	Suppresses the display of empty queues.
`-qs` {a\|c\|d\|o!\|s\|u\|A\|C\|D\|E\|S}	Selects queues to be displayed according to state

With the -f option, qstat uses full format, which includes the following columns.

Column header	Description
`queuename`	job id assigned to the job
`resv/used/total`	Number of slots reserved/used/total
`states`	current job status, including a(larm), s(uspended), d(isabled), h(old), E(rror), P(reempted)

Examples:

List all queues that are unavailable because they are disabled or the slotwise preemption limits have been reached.

qstat -f -qs dP

List the queues associated with the investing entity it_css.

qstat -f | egrep '(queuename|it_css)'

You can determine overall queue and node information using the qstatgrp, qconf, qnodes and qhostgrp commands. Use a command's -h option to see its command syntax. To obtain information about a group other than your current group, use the -g option.

Command	Illustrative example
`qstatgrp`	`qstatgrp` shows a summary of the status of the owner-group queues of your current workgroup.
`qstatgrp -j`	`qstatgrp -j` shows the status of each job in the owner-group queues that members of your current workgroup submitted.
`qstatgrp -g` «investing_entity»	`qstatgrp -g it_css` shows the status of all the owner-group queues for the it_css investing-entity.
`qstatgrp -j -g` «investing_entity»	`qstatgrp –j -g it_css` shows the status of each job in the owner-group queues that members of the it_css investing-entity submitted.
`qconf -sql`	Shows all queues as a list.
`qconf -sq` «queue_name»*	`qconf -sq it_css` displays the configuration of each owner-group queues for the it_css* investing-entity.
`qnodes`	`qnodes` displays the names of your owner-group's nodes
`qnodes -g` «investing_entity»	`qnodes -g it_css` displays the name of the nodes owned by the it_css investing-entity.
`qhostgrp`	`qhostgrp` displays the current status of your owner-group's nodes
`qhostgrp –g` «investing_entity»	`qhostgrp -g it_css` displays the current status of the nodes owned by the it_css investing-entity.
`qhostgrp -j -g` «investing_entity»	`qhostgrp –j -g it_css` shows all jobs running (including standby and spillover) in the owner-group nodes for the it_css investing-entity.

Resource quotas are used to help control the standby and spillover queues. Each user has a quota based on the limits set by the standby queue specifications for each cluster, and each workgroup has a per_workgroup quota based on the number of slots purchased by the research group.

Command	Illustrative example
`qquota -u` «username» `\| grep standby`	`qquota -u traine \| grep standby` displays the current usage of slots by user traine in the standby resources.
`qquota -u \* \| grep` «investing_entity»	`qquota -u \* \| grep it_css` displays the current usage of slots being used by all members of the it_css investing-entity, the per_workgroup quota.

The example below gives a snapshot of slots being used by traine user in the standby queues and the slots being used by all members of the workgroup it_css

$ qquota -u traine | grep standby
standby_limits/4h  slots=80/800         users traine queues standby-4h.q
standby_cumulative/default slots=80/800         users traine queues standby.q,standby-4h.q
$ qquota -u \* | grep it_css
per_workgroup/it_css slots=141/200        users @it_css queues it_css.q,spillover.q

If there are no jobs running as part of your workgroup, then your per_workgroup quota (of 0 out of N slots) doesn't get displayed, period.

Use the qdel «job_id» command to remove pending and running jobs from the queue.

For example, to delete job 28000

  qdel 28000

Your job is not deleted

If you have a job that remains in a delete state, even after you try to delete it with the qdel command, then try a force deletion with

  qdel -f 28000

This will just forget about the job without attempting any cleanup on the node(s) being used.

Command	Illustrative example
`qquota -u` «username» `\| grep standby`	`qquota -u traine \| grep standby` displays the current usage of slots by user traine in the standby resources.
`qquota -u \* \| grep` «investing_entity»	`qquota -u \* \| grep it_css` displays the current usage of slots being used by all members of the it_css investing-entity, the per_workgroup quota.

Managing Jobs on Farber

Viewing Status by Hosts

Host Status, Literally

Full Status for a Job

Status in XML Format

UD IT Status Commands

qjobs

Nodes Owned by User

Per-Group QStat and QHost

Resource-management options

Managing Jobs

Checking job status

A more concise listing

Job status is qw

Job status is Eqw

Checking queue status

Checking overall queue and node information

Checking overall usage of resource quotas

Deleting a job

hpc documentation