====== Managing Jobs on Farber ====== Once a user has been able to submit jobs to the queue -- interactive or batch -- the user will from time to time want to know what those jobs are doing. Is the job waiting in a queue for resources to become available, or is it executing? How long has the job been executing? How much CPU time or memory has the job consumed? Users can query Grid Engine for job information using the ''qstat'' command. The ''qstat'' command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document. Please see the ''qstat'' man page for a detailed description of all options. With no options provided, ''qstat'' defaults to displaying a list of all incomplete jobs submitted by the user. This includes jobs that are waiting in a queue, jobs that are executing, and jobs that are in an error state. The list is presented in a tabular format, with the following columns: ^Column^Description^ |job-ID|Numerical identifier assigned when the job was submitted| |prior|Scheduling priority of the job; a real number between 0 (low) and 1 (high)| |name|The name assigned to the job, e.g. with the ''-N'' option to ''qsub''; usually abbreviated to ~ 10 characters| |user|The owner of the job| |state|Current state of the job (see next table)| |submit/start at|Either the time the job was submitted or the time the job began execution, depending on its state| |queue|The primary //queue instance// to which the (executing) job has been assigned| |slots|The number of processing cores assigned to the job; see Jobs with Parallelism for more information| |ja-task-ID|The secondary identifier for the job; see [[35_parallelism#array-jobs|Array Jobs]] for more information| The different states in which a job may exist are enumerated by the following codes: ^State Code^Description^ |''qw''|Job is **q**ueued and **w**aiting to execute| |''t''|Job is ready to execute and is **t**ransferring to its assigned node. Jobs usually go from ''t'' to ''r'' quickly, but very large parallel jobs may persist in ''t'' for a short time.| |''r''|Job is executing (**r**unning)| |''Eqw''|An error occurred when Grid Engine attempted to schedule the job, so it has been returned to the ''qw'' state| |''s''|Job has been **s**uspended so that a higher-priority job can preempt it and use its resources.| |''d''|Job has completed and is being **d**eleted from its queue.| |''h''|Displayed when a //hold// has been placed on a job, such that other jobs must complete before it can begin. See Using Job Holds for more information.| The ''qstat'' command also allows the user to see job status information for any other cluster user by means of the ''-u'' flag. The flag requires a single argument: a username or the wildcard character (''\*''): [(it_css:traine)@farber it_css]$ qstat -u traine : [(it_css:traine)@farber it_css]$ qstat -u \* : Specifying the wildcard argument displays job status for all cluster users. ===== Viewing Status by Hosts ===== In all forms discussed above the output from ''qstat'' focuses on jobs. To instead view the status information in a host-centric format, the ''-f'' option should be added to the ''qstat'' command. The output from ''qstat -f'' is organized by queue instances (thus, also by compute hosts) with jobs running in a particular queue instance summarized therein: [(it_css:traine)@farber it_css]$ qstat -f -q 'it_css*' queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- it_css-qrsh.q@n015 IP 0/0/24 23.76 lx24-amd64 d --------------------------------------------------------------------------------- it_css-qrsh.q@n016 IP 0/1/24 4.90 lx24-amd64 71882 0.56283 QLOGIN traine r 09/21/2012 10:01:14 1 --------------------------------------------------------------------------------- : --------------------------------------------------------------------------------- it_css.q+@n015 BP 0/24/24 23.76 lx24-amd64 dP 71882 0.56283 RhDimer1 traine r 08/28/2012 11:32:19 24 --------------------------------------------------------------------------------- it_css.q+@n016 BP 0/4/24 4.90 lx24-amd64 71882 0.56283 RhDimer1 traine r 08/28/2012 11:32:19 4 --------------------------------------------------------------------------------- : Without the ''-q'' option, the command would have displayed information for every queue instance (and there are many queue instances). The argument to ''-q'' is a queue or queue instance name, and may contain wildcard (*) characters as demonstrated here: ''it_css*'' = all queue instances whose name starts with ''it_css''. The ''-ne'' option is also useful in this regard: it filters the ''qstat -f'' output to only those queue instances with jobs that are executing. As with the job-centric display method, the ''-u'' flag can be used to affect what user(s) jobs are displayed. One helpful feature of displaying jobs in host-centric fashion is that parallel jobs that span multiple hosts will be displayed under each queue instance in which they run: the "RhDimer1" job in the example above uses 28 slots across the ''n015'' and ''n016'' hosts. The host-centric view displays per-job information similar to that which the job-centric view provided. In addition, it displays information about the queue instances and the host associated with the queue instance. Each Grid Engine queue restricts what kinds of jobs it will accept, and this is summarized under the "qtype" (or queue type) heading. The common queue types are: ^QStat Letter^Description^ |''B''|**B**atch jobs (via ''qsub'') can run in this queue| |''I''|**I**nteractive jobs (via ''qlogin'') can run in this queue| |''P''|Jobs which use a **P**arallel environment can run in this queue| Following the queue type is a summary of the slot usage for the queue instance. The third integer is the total number of slots and the second is the number of slots currently in-use; the number of free slots is just the total minus the in-use count. The real number that follows the slot summary is the //load// on the host associated with the queue instance. The load is calculated by Grid Engine using a formula that can include not only the host's [[http://en.wikipedia.org/wiki/Load_(computing)|Unix load average]] but usage level thereon of any resource of which Grid Engine is aware (memory, disk). A queue instance with a reported load of "''N/A''" is probably due to the host's being offline (shutdown, crashed, etc.). UD IT makes every effort to reboot offline hosts as quickly as possible or repair the system if a hardware fault is the reason for its being offline. The remaining columns show the processor architecture for the host and the state of the queue instance. Queue states are indicated by a series of letters (just like job states) and the absence of any letters implies the queue is online and without problems. ^QStat Letter^State Description^ |''d''|The queue instance has been **d**isabled by a system administrator and will accept no new jobs.| |''a''|The load has exceeded a threshold, producing an **a**larm on the queue instance; no new jobs will be scheduled until the alarm condition changes.| |''u''|The host cannot be contacted on the network, leaving it in an **u**nknown state.| |''P''|The queue instance is //subordinate// to another queue and jobs running in it may be **p**reempted by jobs entering its superior queue.| Recall that a load of ''N/A'' usually indicates that a host is offline; this is confirmed by the queue instance's also having a state of ''au''. UD IT disables a queue instance (state ''d'') if the host associated with it requires maintenance: a reboot to apply new software features, the replacement of a failing memory module, etc. ==== Host Status, Literally ==== Grid Engine also features a ''qhost'' command that can be used to report status of a host, period: no job information or queue instance information will be displayed: [(it_css:traine)@farber it_css]$ qhost -h n013 -h n014 HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS ------------------------------------------------------------------------------- global - - - - - - - n013 lx24-amd64 24 24.10 63.0G 2.2G 126.1G 689.5M n014 lx24-amd64 24 24.09 63.0G 2.3G 126.1G 0.0 Without any ''-h'' options the ''qhost'' command displays information for every host. The command has other options that will be discussed elsewhere; view the ''qhost'' man page for a description of all options available. Grid Engine strives to keep the load (''LOAD'') less than or equal to the core count (''NCPU'') on a host. If the load is significantly greater than the number of cores, then the jobs running on that node will likely not be running at optimum efficiency. Quite often a load much higher than ''NCPU'' is the result of extensive use of swap space (''SWAPUS''). This is usually the hallmark of jobs that are allocating more memory than the host has physically available to it. //Swapping// moves data back and forth between memory and hard disk in order to make the host appear to have more memory at the expense of speed and efficiency. If your jobs are triggering extensive use of swap space, you may need to analyze how your program uses memory and modify the job to better match the resources available. ===== Full Status for a Job ===== So far the per-job output has been limited to a name, submit/run time, and some state flags. Grid Engine maintains a far more extensive set of parameters for each job which can be viewed using the ''-j ''<<''job_id''>> option: [(it_css:traine)@farber it_css]$ qstat -j 82518 ============================================================== job_number: 82518 exec_file: job_scripts/82518 submission_time: Mon Oct 1 10:17:34 2012 owner: traine uid: 1201 group: it_css gid: 1002 sge_o_home: /home/1201 sge_o_log_name: traine sge_o_path: /home/1201/bin:/opt/sbin:/usr/sbin:/sbin:/home/1201/bin:/opt/sbin:/usr/sbin:/sbin:/usr/lib64/qt-3.3/bin:/opt/bin:/home/software/GridEngine/6.2u7/bin/lx24-amd64:/usr/local/bin:/bin:/usr/bin:/usr/local/sbin:/usr/sbin:/sbin sge_o_shell: /bin/bash sge_o_workdir: /lustre/work/it_css sge_o_host: farber account: sge cwd: /lustre/work/it_css merge: y hard resource_list: idle_resources=0,dev_resources=0,exclusive=1,standby_resources=1,scratch_free=1000000 mail_list: traine@farber.hpc.udel.edu notify: FALSE job_name: mpibounce.qs priority: -1023 jobshare: 0 env_list: script_file: mpibounce.qs parallel environment: openmpi range: 48 verify_suitable_queues: 1 usage 1: cpu=13:48:45, mem=13134.08773 GBs, io=0.07543, vmem=12.830G, maxvmem=12.830G scheduling info: (Collecting of scheduler job information is turned off) For jobs which are actively executing, the ''usage'' line displays the accumulated CPU time and memory usage. Note the unit of ''GBs'' on the memory usage: just as electricity usage is measured in kilowatt-hours, memory consumption in Grid Engine is a sum over time. Some clusters may have strict accounting that limits users' total CPU or memory usage or even bills for it. If peak instantaneous memory usage (the ''vmem'' and ''maxvmem'' properties shown) were the billable quantity, then a program that uses 16 GB of memory for a few seconds would be treated the same as a program that uses 16 GB of memory for five days! ===== Status in XML Format ===== For all examples cited above the ''qstat'' command displays its output in formats that are //human readable//. Often a human-readable format is difficult for a computer program to understand. Suppose a cluster user found none of the formats summarized above to be to his or her taste. That user could write a program (a Perl or Python script, a C program, etc.) that consumes the output from ''qstat'' and transforms it to his/her preference. The ''qstat'' command aids in this venture by being able to display any of its output in XML rather than human-readable format. Adding the ''-xml'' option to any ''qstat'' command enables display in XML format. The XML document structures exported by ''qstat'' are outside the scope of this documentation; consult the ''qstat'' man page for more details. ===== UD IT Status Commands ===== UD IT has created additional status summary commands that build on the ''qstat'' and ''qhost'' commands provided by Grid Engine. In most cases these commands are conveniences that fill-in some of the options that were summarized above for the user. All commands display a terse summary of their options if the ''-h'' or ''--help'' option is provided with the command. These commands are not part of the Grid Engine job scheduling software. They are made available on clusters maintained by UD IT that use the Grid Engine job scheduler and are not likely to be available on other clusters to which a user has access. ==== qjobs ==== The ''qjobs'' command displays job status in a more compact format: [(it_css:traine)@farber ~]$ qjobs =============================================================================== JobID Owner State Submitted as =============================================================================== 12568 traine running mpibounce.qs 12584 traine running QLOGIN =============================================================================== 2 jobs total. By default only jobs owned by the user issuing the command are displayed. The ''-g'' option displays all jobs for the user's current group; additionally providing a group name with ''-g'' displays jobs for users who are members of that specific group: [(it_css:traine)@farber ~]$ qjobs -g sandler_thermo =============================================================================== JobID Owner State Submitted as =============================================================================== 80270 odmitr running g09ex 82518 frey running job1 82524 frey running job2 =============================================================================== 3 jobs total. The ''-a'' option displays jobs for all cluster users. ==== Nodes Owned by User ==== Given a user's current group, the ''qnodes'' command displays a list of hosts owned by that group. Under the queueing policies adopted by UD IT, users in a group are guaranteed a high level of access to those nodes. ==== Per-Group QStat and QHost ==== The ''qstat'' and ''qhost'' commands can be restricted to only those hosts owned by the user's current group (see ''qnodes'' above). The ''qstatgrp'' and ''qhostgrp'' commands accept the same options as ''qstat'' and ''qhost'', respectively, but limit the display to the list of owned nodes (or queues associated with those nodes). The ''qstatgrp'' command by default summarizes usage of all queues to which the user has access given his/her current working group. Adding the ''-j'' flag summarizes the jobs executing in those queues rather than summarizing the queues themselves. The ''qhostgrp'' command by default summarizes usage of all hosts to which the user has access given his/her current working group. Adding the ''-j'' flag summarizes the jobs (including [[abstract:farber:runjobs:queues#farber-standby-queues|standby]]) executing on those hosts rather than summarizing the hosts themselves. Both ''qstatgrp'' and ''qhostgrp'' accept a ''-g ''<<''group name''>> option to limit to an arbitrary group (and not just the user's current working group). ==== Resource-management options ==== Any large cluster will have many nodes with perhaps differing resources, e.g., cores, memory, disk space and accelerators. The ones you can request come in three categories. - Fixed resources by the configuration - slots and installed memory, - Set by load sensor - CPU load averages, memory usage - Managed by job scheduler internal bookkeeping to ensure availability - available memory and floating software licenses. **Details by cluster** * [[abstract:farber:runjobs:schedule_jobs#resource-management-options-on-farber|Farber]] ===== Managing Jobs ===== ==== Checking job status ==== Use the **qstat** command to check the status of queued jobs. Use the ''qstat -h'' or ''man qstat'' commands on the login node to view a complete description of available options. Some of the most often-used options are summarized here: ^ Option ^ Result ^ | ''-j'' <> | Displays information for specified job(s) | | ''-u'' <> | Displays information for jobs associated with the specified user(s) | | ''-ext'' | Displays extended information about jobs | | ''-t'' | Shows additional information about subtasks | | ''-r'' | Shows resource requirements of jobs | For example, to list the information for job 62900, type qstat -j 62900 To list a table of jobs assigned to user //traine// that displays the resource requirements for each job, type qstat -u traine -r With no options **qstat** defaults to ''qstat -u $USER'', so you get a table for your jobs. With the ''-u'' option the **qstat** command uses //Reduced Format// with following columns. ^ Column header ^ Description ^ | ''job-ID'' | job id assigned to the job | | ''user'' | user who owns the job | | ''name'' | job name| | ''state'' | current job status, including **qw**(aiting) , **s**(uspended), **r**(unning), **h**(old), **E**(rror), **d**(eletion) | | ''submit/start at'' | submit time (waiting jobs) or start time (running jobs) | | ''queue'' | name of the queue the job is assigned to (for running or suspended jobs only)| | ''slots'' | number of slots assigned to the job | === A more concise listing ==== The IT-supplied **qjobs** command provides a more convenient listing of job status. ^ Command ^ Description ^ | ''qjobs'' | Displays the status of jobs submitted by you | | ''qjobs -g'' | Displays the status of jobs submitted by your research group | | ''qjobs –g ''<>'' ''| Displays the status of jobs submitted by members of the named investing-entity | | ''qjobs –a'' | Displays the status of jobs submitted by **a**ll users | In all cases the JobID, Owner, State and Name are listed in a table. === Job status is qw === When your job status is ''qw'' it means your job is queued and waiting to execute. When you check with ''qstat'' you might see something like this [(it_css:traine)@farber it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 99154 0.50661 openmpi-pg traine qw 11/12/2012 14:33:49 144 Sometimes your job is stuck and remains in the ''qw'' state and never starts running. You can use **qalter** to poke at the job scheduler to see why your job is not running. For example, to see the last 10 lines of the job scheduler validation for job 99154, you can type [(it_css:traine)@farber it_css]$ qalter -w p 99154 | tail -10 Job 99154 has no permission for cluster queue "puleo-qrsh.q" Job 99154 has no permission for cluster queue "capsl.q+" Job 99154 has no permission for cluster queue "spare.q" Job 99154 has no permission for cluster queue "it_nss-qrsh.q" Job 99154 has no permission for cluster queue "it_nss.q" Job 99154 has no permission for cluster queue "it_nss.q+" Job 99154 Jobs cannot run because only 72 of 144 requested slots are available Job 99154 Jobs can not run in PE "openmpi" because the resource requirements can not be satified verification: no suitable queues In this example, we asked for 144 slots, but only 72 slots are available for workgroup ''it_css'' nodes. [(it_css:traine)@farber it_css]$ qstatgrp CLUSTER QUEUE CQLOAD USED RES AVAIL TOTAL aoACDPS cdsuE it_css-dev.q 0.00 0 0 72 72 0 0 it_css-qrsh.q 0.00 0 0 72 72 0 0 it_css.q 0.00 0 0 72 72 0 0 it_css.q+ 0.00 0 0 72 72 0 0 standby-4h.q 0.27 0 0 4968 5064 0 96 standby.q 0.27 12 0 4932 5064 0 120 Use **qalter** to change the attributes of the pending job such as reducing the number of slots requested to be within the workgroup ''it_css'' nodes or change the resources specified to the [[:abstract:farber:runjobs:queues#farber-standby-queues|standby queue]] so the job could run. For example, let's change the number of slots requested to 48 instead of 144 by using [(it_css:traine)@farber it_css]$ qalter -pe openmpi 48 99154 modified parallel environment of job 99154 modified slot range of job 99154 [(it_css:traine)@farber it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 99154 0.50661 openmpi-pg traine r 11/12/2012 14:33:49 48 Another way to get this job running would be to change the resource for the job to run in the standby queue. To do this you must specify all resources since ''qalter'' completely replaces any parameters previously specified for the job by that option. In this example, we alter the job to run in the standby queue by using [(it_css:traine)@farber it_css]$ qalter -l idle=0,standby=1 99154 modified hard resource list of job 99154 [(it_css:traine)@farber it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 99154 0.50661 openmpi-pg traine r 11/12/2012 15:23:52 standby.q@n016 144 ''qalter'' can only be used to alter jobs that you own! === Job status is Eqw === When your job status is ''Eqw'' it means an error occurred when Grid Engine attempted to schedule the job, so it has been returned to the qw state. When you check with ''qstat'' you might see something like this for user ''traine'' [(it_css:traine)@farber it_css]$ qstat -u traine job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 686924 0.50509 openmpi-pg traine Eqw 08/12/2014 19:38:53 1 If the state shows ''Eqw'', then use ''qstat -j //job_id// | grep error'' to check for the error. Here is an example of what you might see [traine@farber ~]$ qstat -j 686924 | grep error error reason 1: 08/12/2014 22:08:27 [1208:60529]: error: can't chdir to /archive/it_css/traine/ex-openmpi: No such file or directory This error indicates that some directory or file (respectively) cannot be found. Verify that the file or directory in question exists, i.e., you haven't forgotten to create it and you can see it from the head node and compute node. If it appears to be okay, then the job may have suffered a transient condition such as a failed NFS automount, the NFS server was temporarily down, or some other filesystem error occurred. If you understand the reason and can get it fixed, use ''qmod -cj //job_id//'' to clear the error state like this: [traine@farber ~]$ qmod -cj 686924 and it should eventually run. ==== Checking queue status ==== The **qstat** command can also be used to get status of all queues on the system. ^ Option ^ Result ^ | ''-f'' | Displays summary information for all queues | | ''-ne'' | Suppresses the display of empty queues. | | ''-qs'' {a%%|%%c%%|%%d%%|%%o!%%|%%s%%|%%u%%|%%A%%|%%C%%|%%D%%|%%E%%|%%S} | Selects queues to be displayed according to state | With the ''-f'' option, **qstat** uses //full format//, which includes the following columns. ^ Column header ^ Description ^ | ''queuename'' | job id assigned to the job | | ''resv/used/total'' | Number of slots reserved/used/total | | ''states'' | current job status, including **a**(larm), **s**(uspended), **d**(isabled), **h**(old),\\ **E**(rror), **P**(reempted) | Examples: List all queues that are unavailable because they are disabled or the slotwise preemption limits have been reached. qstat -f -qs dP List the queues associated with the investing entity //it_css//. qstat -f | egrep '(queuename|it_css)' ==== Checking overall queue and node information ==== You can determine overall queue and node information using the ''qstatgrp'', ''qconf'', ''qnodes'' and ''qhostgrp'' commands. Use a command's ''-h'' option to see its command syntax. To obtain information about a group other than your current group, use the ''-g'' option. ^ Command ^ Illustrative example ^ | ''qstatgrp'' | ''qstatgrp'' shows a summary of the status of the owner-group queues of your current workgroup.**** | | ''qstatgrp -j'' | ''qstatgrp -j'' shows the status of each job in the owner-group queues that members\\ of your current workgroup submitted.**** | | ''qstatgrp -g'' <> | ''qstatgrp -g it_css'' shows the status of all the owner-group queues for the\\ //it_css// investing-entity. | | ''qstatgrp -j -g'' <> | ''qstatgrp –j -g it_css'' shows the status of each job in the owner-group queues that\\ members of the //it_css// investing-entity submitted.**** | | ''qconf -sql'' | **S**hows all **q**ueues as a **l**ist. | | ''qconf -sq'' <>* | ''qconf -sq it_css*'' displays the configuration of each owner-group queues for the\\ //it_css// investing-entity. | | ''qnodes'' | ''qnodes'' displays the names of your owner-group's nodes | | ''qnodes -g'' <> | ''qnodes -g it_css'' displays the name of the nodes owned by the\\ //it_css// investing-entity. | | ''qhostgrp'' | ''qhostgrp'' displays the current status of your owner-group's nodes | | ''qhostgrp –g'' <> | ''qhostgrp -g it_css'' displays the current status of the nodes owned by the\\ //it_css// investing-entity. | | ''qhostgrp -j -g'' <> | ''qhostgrp –j -g it_css'' shows all jobs running (including [[:abstract:farber:runjobs:queues#farber-standby-queues|standby]] and spillover) in the owner-group nodes for the //it_css// investing-entity. | ==== Checking overall usage of resource quotas ==== Resource quotas are used to help control the standby and spillover queues. Each user has a quota based on the limits set by the [[:abstract:farber:runjobs:queues#farber-standby-queues|standby]] queue specifications for each cluster, and each workgroup has a per_workgroup quota based on the number of slots purchased by the research group. ^ Command ^ Illustrative example ^ | ''qquota -u'' <> ''| grep standby'' | ''qquota -u traine | grep standby'' displays the current usage of slots by user\\ //traine// in the standby resources. | | ''qquota -u \* | grep'' <> | ''qquota -u \* | grep it_css'' displays the current usage of slots being used by all\\ members of the //it_css// investing-entity, the per_workgroup quota. | The example below gives a snapshot of slots being used by ''traine'' user in the standby queues and the slots being used by all members of the workgroup ''it_css'' $ qquota -u traine | grep standby standby_limits/4h slots=80/800 users traine queues standby-4h.q standby_cumulative/default slots=80/800 users traine queues standby.q,standby-4h.q $ qquota -u \* | grep it_css per_workgroup/it_css slots=141/200 users @it_css queues it_css.q,spillover.q If there are no jobs running as part of your workgroup, then your per_workgroup quota (of 0 out of N slots) doesn't get displayed, period. ==== Deleting a job ==== Use the **qdel** <> command to remove pending and running jobs from the queue. For example, to delete job 28000 qdel 28000 **Your job is not deleted** If you have a job that remains in a delete state, even after you try to delete it with the **qdel** command, then try a force deletion with qdel -f 28000 This will just forget about the job without attempting any cleanup on the node(s) being used.