technical:gridengine:geco:03_geco-rsrcinfo

<booktoc>

The geco-rsrcinfo Command

The geco-rsrcinfo command makes use of the job resource lookup functionality in the GECO core library to:

  • parse the output of qstat -xml -j #.# (either provided on stdin or executed by geco-rsrcinfo itself)
  • output a chunk of bash scripting that when executed adds job resource information to the environment
  • serialize job resource information to a file
  • unserialize job resource information from a file

The command is primarily meant to be used by GECO's Grid Engine prolog and epilog scripts, but may also have some value within the context of job scripts.

usage:

  geco-rsrcinfo {options} [task-id]

 options:

  -h/--help                    show this information
  -v/--verbose                 increase the verbosity level (may be used
                                 multiple times)
  -q/--quiet                   decrease the verbosity level (may be used
                                 multiple times)
  -m/--mode=[mode]             operate in the given mode:
                                 prolog:   SGE prolog script
                                 epilog:   SGE epilog script
                                 userenv:  user environment
  -p/--prolog                  shorthand for --mode=prolog
  -e/--epilog                  shorthand for --mode=epilog
  -o/--only                    return information for the native
                               host only, not an array of hosts
  -H/--host=[hostname]         return information for the specified
                               host only, not an array of hosts
  -j/--jobid=[job_id]          request info for a specific job id
                                 (without this option, qstat output
                                 is expected on stdin)
  -s/--serialize=[path]        rather than displaying to stdout, serialize
                                 the resource information to the given
                                 filepath
  -u/--unserialize=[path]      unserialize resource information in the
                                 given filepath and display it
  -r/--qstat-retry=#           if qstat fails to return data for a job, retry
                                 this many times

The [task-id] is in reference to array jobs. If no [task-id] is provided it defaults to 1 (the implicit task id for non-array jobs).

Since the output of a qstat command is necessary, any node on which geco-rsrcinfo will be run must be allowed to query the qmaster and must have the qstat command available to it.

Output bash code to setup the "user environment" environment variables to contain all resource information for task 30 of job 310145:

[frey@farber ~]$ geco-rsrcinfo --mode=userenv --jobid=310145 30
SGE_RESOURCE_HOSTS=(); SGE_RESOURCE_NSLOTS=(); SGE_RESOURCE_MEM=(); SGE_RESOURCE_VMEM=(); SGE_RESOURCE_GPU=(); SGE_RESOURCE_PHI=(); SGE_RESOURCE_HOSTS[0]='n085'; SGE_RESOURCE_NSLOTS[0]=1; SGE_RESOURCE_MEM[0]=3221225472; SGE_RESOURCE_VMEM[0]=0; SGE_RESOURCE_GPU[0]=''; SGE_RESOURCE_PHI[0]=''; SGE_RESOURCE_JOB_MAXRT=0; SGE_RESOURCE_JOB_IS_STANDBY=0; SGE_RESOURCE_JOB_VMEM=0; SGE_RESOURCE_JOB_TRACELEVEL=0; SGE_RESOURCE_JOB_CONFIG_PHI_FOR_USER=0;
[frey@farber ~]$ echo $?
0

Since geco-rsrcinfo returns zero, no issues were encountered while finding the information and the output can be eval'ed by the shell to affect the desired changes. Use of the –jobid flag implies that geco-rsrcinfo issued the qstat command itself (versus waiting for qstat output on stdin).

The prolog and epilog modes of operation produce additional bash commands that are used by GECO's Grid Engine prolog and epilog scripts to write resource summaries to a job's standard output file:

[frey@farber ~]$ eval "$(geco-rsrcinfo --mode=prolog --jobid=310145 30)"
[PROLOG] Resource allocation summary
[PROLOG]   1 core on "n085"
[PROLOG]     Memory limit: 3221225472 bytes
[PROLOG]     nVidia GPU: 
[PROLOG]     Intel Phi: 

Increasing the verbosity level will display additional information (informational messages, debugging messages). For example, requesting resource information for a non-existant job:

[frey@farber ~]$ geco-rsrcinfo -vvv --prolog --jobid=1 12
2015-12-22T12:44:19-0500 [22554|DEBUG]:(GECOResource.c:221) executing for task 12 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 1", "r")...
ERROR: job 1.12 is not known to the qmaster
[frey@farber ~]$ echo $?
2

Doing likewise but for a non-existant task that is part of an extant job:

[frey@farber ~]$ geco-rsrcinfo -vvv --prolog --jobid=310145 128
2015-12-22T12:55:17-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")...
2015-12-22T12:55:17-0500 [40737|WARN ]:GECOResourceSetCreate: qstat returned inadequate job information for 310145.128 (reason = 8); sleeping then retrying
2015-12-22T12:55:18-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")...
2015-12-22T12:55:18-0500 [40737|WARN ]:GECOResourceSetCreate: qstat returned inadequate job information for 310145.128 (reason = 8); sleeping then retrying
2015-12-22T12:55:20-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")...
ERROR: resource information not available for job 310145.128; it either does not exist or is not running
[frey@farber ~]$ echo $?
22

GECO's resource information serialization utilizes a fixed, typed textual output format that is very easily parsed (to keep things simple and fast).

[frey@farber ~]$ geco-rsrcinfo --jobid=310145 --serialize=./310145.jobdata 30
[frey@farber ~]$ echo $?
0
[frey@farber ~]$ cat 310145.jobdata 
GECOResourceSet_v1{li310145,li30,lf0.000000,b0,lf0.000000,i0,i1,b1,b0,s8:Disc7_5.,s7:calhoun,s8:hmichael,s10:/home/1429,s4:n085{b0,i1,lf3221225472.000000,lf0.000000,s0:,s0:}}

Rather than querying via qstat, the job resource information can later be reconstituted in the context of GECO:

[frey@farber ~]$ geco-rsrcinfo --unserialize=./310145.jobdata --epilog
SGE_EPILOG_HOSTS=(); SGE_EPILOG_NSLOTS=(); SGE_EPILOG_MEM=(); SGE_EPILOG_VMEM=(); SGE_EPILOG_GPU=(); SGE_EPILOG_PHI=(); SGE_EPILOG_HOSTS[0]='n085'; SGE_EPILOG_NSLOTS[0]=1; SGE_EPILOG_MEM[0]=3221225472; SGE_EPILOG_VMEM[0]=0; SGE_EPILOG_GPU[0]=''; SGE_EPILOG_PHI[0]=''; SGE_EPILOG_JOB_MAXRT=0; SGE_EPILOG_JOB_IS_STANDBY=0; SGE_EPILOG_JOB_VMEM=0; SGE_EPILOG_JOB_TRACELEVEL=0; SGE_EPILOG_JOB_CONFIG_PHI_FOR_USER=0;
  • technical/gridengine/geco/03_geco-rsrcinfo.txt
  • Last modified: 2015-12-22 13:16
  • by 127.0.0.1