abstract:farber:runjobs:job_status

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:farber:runjobs:job_status [2018-05-21 23:37] sraskarabstract:farber:runjobs:job_status [2021-10-13 16:49] (current) – [Per-Group QStat and QHost] anita
Line 1: Line 1:
-====== Viewing Job Status Information ======+====== Managing Jobs on Farber ======
  
 Once a user has been able to submit jobs to the queue -- interactive or batch -- the user will from time to time want to know what those jobs are doing.  Is the job waiting in a queue for resources to become available, or is it executing?  How long has the job been executing?  How much CPU time or memory has the job consumed?  Users can query Grid Engine for job information using the ''qstat'' command.  The ''qstat'' command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document.  Please see the ''qstat'' man page for a detailed description of all options. Once a user has been able to submit jobs to the queue -- interactive or batch -- the user will from time to time want to know what those jobs are doing.  Is the job waiting in a queue for resources to become available, or is it executing?  How long has the job been executing?  How much CPU time or memory has the job consumed?  Users can query Grid Engine for job information using the ''qstat'' command.  The ''qstat'' command has a variety of command line options available to customize and filter what information it displays; discussing all of them is beyond the scope of this document.  Please see the ''qstat'' man page for a detailed description of all options.
Line 29: Line 29:
 The ''qstat'' command also allows the user to see job status information for any other cluster user by means of the ''-u'' flag.  The flag requires a single argument:  a username or the wildcard character (''\*''): The ''qstat'' command also allows the user to see job status information for any other cluster user by means of the ''-u'' flag.  The flag requires a single argument:  a username or the wildcard character (''\*''):
 <code bash> <code bash>
-[(it_css:traine)@mills it_css]$ qstat -u traine+[(it_css:traine)@farber it_css]$ qstat -u traine
    :    :
-[(it_css:traine)@mills it_css]$ qstat -u \*+[(it_css:traine)@farber it_css]$ qstat -u \*
    :    :
 </code> </code>
Line 40: Line 40:
 In all forms discussed above the output from ''qstat'' focuses on jobs.  To instead view the status information in a host-centric format, the ''-f'' option should be added to the ''qstat'' command.  The output from ''qstat -f'' is organized by queue instances (thus, also by compute hosts) with jobs running in a particular queue instance summarized therein: In all forms discussed above the output from ''qstat'' focuses on jobs.  To instead view the status information in a host-centric format, the ''-f'' option should be added to the ''qstat'' command.  The output from ''qstat -f'' is organized by queue instances (thus, also by compute hosts) with jobs running in a particular queue instance summarized therein:
 <code bash> <code bash>
-[(it_css:traine)@mills it_css]$ qstat -f -q 'it_css*'+[(it_css:traine)@farber it_css]$ qstat -f -q 'it_css*'
 queuename                      qtype resv/used/tot. load_avg arch          states queuename                      qtype resv/used/tot. load_avg arch          states
 --------------------------------------------------------------------------------- ---------------------------------------------------------------------------------
Line 91: Line 91:
  
 <code bash> <code bash>
-[(it_css:traine)@mills it_css]$ qhost -h n013 -h n014+[(it_css:traine)@farber it_css]$ qhost -h n013 -h n014
 HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS HOSTNAME                ARCH         NCPU  LOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
 ------------------------------------------------------------------------------- -------------------------------------------------------------------------------
Line 110: Line 110:
  
 <code bash> <code bash>
-[(it_css:traine)@mills it_css]$ qstat -j 82518+[(it_css:traine)@farber it_css]$ qstat -j 82518
 ============================================================== ==============================================================
 job_number:                 82518 job_number:                 82518
Line 124: Line 124:
 sge_o_shell:                /bin/bash sge_o_shell:                /bin/bash
 sge_o_workdir:              /lustre/work/it_css sge_o_workdir:              /lustre/work/it_css
-sge_o_host:                 mills+sge_o_host:                 farber
 account:                    sge account:                    sge
 cwd:                        /lustre/work/it_css cwd:                        /lustre/work/it_css
 merge:                      y merge:                      y
 hard resource_list:         idle_resources=0,dev_resources=0,exclusive=1,standby_resources=1,scratch_free=1000000 hard resource_list:         idle_resources=0,dev_resources=0,exclusive=1,standby_resources=1,scratch_free=1000000
-mail_list:                  traine@mills.hpc.udel.edu+mail_list:                  traine@farber.hpc.udel.edu
 notify:                     FALSE notify:                     FALSE
 job_name:                   mpibounce.qs job_name:                   mpibounce.qs
Line 159: Line 159:
  
 <code bash> <code bash>
-[(it_css:traine)@mills ~]$ qjobs+[(it_css:traine)@farber ~]$ qjobs
 =============================================================================== ===============================================================================
 JobID  Owner              State    Submitted as JobID  Owner              State    Submitted as
Line 172: Line 172:
  
 <code bash> <code bash>
-[(it_css:traine)@mills ~]$ qjobs -g sandler_thermo+[(it_css:traine)@farber ~]$ qjobs -g sandler_thermo
 =============================================================================== ===============================================================================
 JobID  Owner              State    Submitted as JobID  Owner              State    Submitted as
Line 195: Line 195:
 The ''qstatgrp'' command by default summarizes usage of all queues to which the user has access given his/her current working group.  Adding the ''-j'' flag summarizes the jobs executing in those queues rather than summarizing the queues themselves. The ''qstatgrp'' command by default summarizes usage of all queues to which the user has access given his/her current working group.  Adding the ''-j'' flag summarizes the jobs executing in those queues rather than summarizing the queues themselves.
  
-The ''qhostgrp'' command by default summarizes usage of all hosts to which the user has access given his/her current working group.  Adding the ''-j'' flag summarizes the jobs (including [[general/jobsched/standby|standby]]) executing on those hosts rather than summarizing the hosts themselves.+The ''qhostgrp'' command by default summarizes usage of all hosts to which the user has access given his/her current working group.  Adding the ''-j'' flag summarizes the jobs (including [[abstract:farber:runjobs:queues#farber-standby-queues|standby]]) executing on those hosts rather than summarizing the hosts themselves.
  
 Both ''qstatgrp'' and ''qhostgrp'' accept a ''-g ''<<''group name''>> option to limit to an arbitrary group (and not just the user's current working group). Both ''qstatgrp'' and ''qhostgrp'' accept a ''-g ''<<''group name''>> option to limit to an arbitrary group (and not just the user's current working group).
Line 211: Line 211:
 **Details by cluster** **Details by cluster**
  
-   * [[clusters:mills:runapps#resource-management-options|Mills]] +   * [[abstract:farber:runjobs:schedule_jobs#resource-management-options-on-farber|Farber]]
-   * [[clusters:farber:runapps#resource-management-options|Farber]]+
  
 ===== Managing Jobs ===== ===== Managing Jobs =====
Line 265: Line 264:
  
 <code base> <code base>
-[(it_css:traine)@mills it_css]$ qstat -u traine+[(it_css:traine)@farber it_css]$ qstat -u traine
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 ----------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------
Line 274: Line 273:
  
 <code base> <code base>
-[(it_css:traine)@mills it_css]$ qalter -w p 99154 | tail -10+[(it_css:traine)@farber it_css]$ qalter -w p 99154 | tail -10
 Job 99154 has no permission for cluster queue "puleo-qrsh.q" Job 99154 has no permission for cluster queue "puleo-qrsh.q"
 Job 99154 has no permission for cluster queue "capsl.q+" Job 99154 has no permission for cluster queue "capsl.q+"
Line 289: Line 288:
  
 <code base> <code base>
-[(it_css:traine)@mills it_css]$ qstatgrp+[(it_css:traine)@farber it_css]$ qstatgrp
 CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDPS  cdsuE CLUSTER QUEUE                   CQLOAD   USED    RES  AVAIL  TOTAL aoACDPS  cdsuE
 it_css-dev.q                      0.00      0      0     72     72      0      0 it_css-dev.q                      0.00      0      0     72     72      0      0
Line 299: Line 298:
 </code> </code>
  
-Use **qalter** to change the attributes of the pending job such as reducing the number of slots requested to be within the workgroup ''it_css'' nodes or change the resources specified to the [[general:jobsched:standby|standby queue]] so the job could run. For example, let's change the number of slots requested to 48 instead of 144 by using+Use **qalter** to change the attributes of the pending job such as reducing the number of slots requested to be within the workgroup ''it_css'' nodes or change the resources specified to the [[:abstract:farber:runjobs:queues#farber-standby-queues|standby queue]] so the job could run. For example, let's change the number of slots requested to 48 instead of 144 by using
  
 <code base> <code base>
-[(it_css:traine)@mills it_css]$ qalter -pe openmpi 48 99154+[(it_css:traine)@farber it_css]$ qalter -pe openmpi 48 99154
 modified parallel environment of job 99154 modified parallel environment of job 99154
 modified slot range of job 99154 modified slot range of job 99154
-[(it_css:traine)@mills it_css]$ qstat -u traine+[(it_css:traine)@farber it_css]$ qstat -u traine
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 ----------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------
Line 313: Line 312:
 Another way to get this job running would be to change the resource for the job to run in the standby queue.  To do this you must specify all resources since ''qalter'' completely replaces any parameters previously specified for the job by that option. In this example, we alter the job to run in the standby queue by using Another way to get this job running would be to change the resource for the job to run in the standby queue.  To do this you must specify all resources since ''qalter'' completely replaces any parameters previously specified for the job by that option. In this example, we alter the job to run in the standby queue by using
 <code base> <code base>
-[(it_css:traine)@mills it_css]$ qalter -l idle=0,standby=1 99154+[(it_css:traine)@farber it_css]$ qalter -l idle=0,standby=1 99154
 modified hard resource list of job 99154 modified hard resource list of job 99154
-[(it_css:traine)@mills it_css]$ qstat -u traine+[(it_css:traine)@farber it_css]$ qstat -u traine
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 ----------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------
Line 328: Line 327:
  
 <code base> <code base>
-[(it_css:traine)@mills it_css]$ qstat -u traine+[(it_css:traine)@farber it_css]$ qstat -u traine
 job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID job-ID  prior   name       user         state submit/start at     queue                          slots ja-task-ID
 ----------------------------------------------------------------------------------------------------------------- -----------------------------------------------------------------------------------------------------------------
Line 337: Line 336:
  
 <code base> <code base>
-[traine@mills ~]$ qstat -j 686924 | grep error+[traine@farber ~]$ qstat -j 686924 | grep error
 error reason    1:          08/12/2014 22:08:27 [1208:60529]: error: can't chdir to /archive/it_css/traine/ex-openmpi: No such file or directory error reason    1:          08/12/2014 22:08:27 [1208:60529]: error: can't chdir to /archive/it_css/traine/ex-openmpi: No such file or directory
 </code> </code>
Line 346: Line 345:
  
 <code base> <code base>
-[traine@mills ~]$ qmod -cj 686924+[traine@farber ~]$ qmod -cj 686924
 </code> </code>
  
Line 395: Line 394:
 | ''qhostgrp''  | ''qhostgrp'' displays the current status of your owner-group's nodes | | ''qhostgrp''  | ''qhostgrp'' displays the current status of your owner-group's nodes |
 | ''qhostgrp –g'' <<//investing_entity//>>  | ''qhostgrp -g it_css'' displays the current status of the nodes owned by the\\ //it_css// investing-entity. | | ''qhostgrp –g'' <<//investing_entity//>>  | ''qhostgrp -g it_css'' displays the current status of the nodes owned by the\\ //it_css// investing-entity. |
-| ''qhostgrp -j -g'' <<//investing_entity//>>  | ''qhostgrp –j -g it_css'' shows all jobs running (including [[general/jobsched/standby|standby]] and spillover) in the owner-group nodes for the //it_css//  investing-entity. |+| ''qhostgrp -j -g'' <<//investing_entity//>>  | ''qhostgrp –j -g it_css'' shows all jobs running (including [[:abstract:farber:runjobs:queues#farber-standby-queues|standby]] and spillover) in the owner-group nodes for the //it_css//  investing-entity. |
  
 ==== Checking overall usage of resource quotas ==== ==== Checking overall usage of resource quotas ====
  
-Resource quotas are used to help control the standby and spillover queues.  Each user has a quota based on the limits set by the [[general/jobsched/standby|standby]] queue specifications for each cluster, and each workgroup has a per_workgroup quota based on the number of slots purchased by the research group.+Resource quotas are used to help control the standby and spillover queues.  Each user has a quota based on the limits set by the [[:abstract:farber:runjobs:queues#farber-standby-queues|standby]] queue specifications for each cluster, and each workgroup has a per_workgroup quota based on the number of slots purchased by the research group.
  
 ^ Command ^ Illustrative example ^ ^ Command ^ Illustrative example ^
-| ''qquota -u'' <<//username//>> ''| grep standby''  | ''qquota -traine | grep standby'' displays the current usage of slots by user\\ //traine// in the standby resources.  |+| ''qquota -u'' <<//username//>> ''| grep standby''  | ''qquota -traine | grep standby'' displays the current usage of slots by user\\ //traine// in the standby resources.  |
 | ''qquota -u \* | grep'' <<//investing_entity//>>  | ''qquota -u \* | grep it_css'' displays the current usage of slots being used by all\\ members of the //it_css// investing-entity, the per_workgroup quota.  | ''qquota -u \* | grep'' <<//investing_entity//>>  | ''qquota -u \* | grep it_css'' displays the current usage of slots being used by all\\ members of the //it_css// investing-entity, the per_workgroup quota. 
  
  • abstract/farber/runjobs/job_status.1526960264.txt.gz
  • Last modified: 2018-05-21 23:37
  • by sraskar