<booktoc>
The geco-rsrcinfo Command
The geco-rsrcinfo
command makes use of the job resource lookup functionality in the GECO core library to:
- parse the output of
qstat -xml -j #.#
(either provided onstdin
or executed bygeco-rsrcinfo
itself) - output a chunk of bash scripting that when executed adds job resource information to the environment
- serialize job resource information to a file
- unserialize job resource information from a file
The command is primarily meant to be used by GECO's Grid Engine prolog and epilog scripts, but may also have some value within the context of job scripts.
usage: geco-rsrcinfo {options} [task-id] options: -h/--help show this information -v/--verbose increase the verbosity level (may be used multiple times) -q/--quiet decrease the verbosity level (may be used multiple times) -m/--mode=[mode] operate in the given mode: prolog: SGE prolog script epilog: SGE epilog script userenv: user environment -p/--prolog shorthand for --mode=prolog -e/--epilog shorthand for --mode=epilog -o/--only return information for the native host only, not an array of hosts -H/--host=[hostname] return information for the specified host only, not an array of hosts -j/--jobid=[job_id] request info for a specific job id (without this option, qstat output is expected on stdin) -s/--serialize=[path] rather than displaying to stdout, serialize the resource information to the given filepath -u/--unserialize=[path] unserialize resource information in the given filepath and display it -r/--qstat-retry=# if qstat fails to return data for a job, retry this many times
The [task-id]
is in reference to array jobs. If no [task-id]
is provided it defaults to 1 (the implicit task id for non-array jobs).
Since the output of a qstat
command is necessary, any node on which geco-rsrcinfo
will be run must be allowed to query the qmaster and must have the qstat
command available to it.
Examples
Output bash code to setup the "user environment" environment variables to contain all resource information for task 30 of job 310145:
[frey@farber ~]$ geco-rsrcinfo --mode=userenv --jobid=310145 30 SGE_RESOURCE_HOSTS=(); SGE_RESOURCE_NSLOTS=(); SGE_RESOURCE_MEM=(); SGE_RESOURCE_VMEM=(); SGE_RESOURCE_GPU=(); SGE_RESOURCE_PHI=(); SGE_RESOURCE_HOSTS[0]='n085'; SGE_RESOURCE_NSLOTS[0]=1; SGE_RESOURCE_MEM[0]=3221225472; SGE_RESOURCE_VMEM[0]=0; SGE_RESOURCE_GPU[0]=''; SGE_RESOURCE_PHI[0]=''; SGE_RESOURCE_JOB_MAXRT=0; SGE_RESOURCE_JOB_IS_STANDBY=0; SGE_RESOURCE_JOB_VMEM=0; SGE_RESOURCE_JOB_TRACELEVEL=0; SGE_RESOURCE_JOB_CONFIG_PHI_FOR_USER=0; [frey@farber ~]$ echo $? 0
Since geco-rsrcinfo
returns zero, no issues were encountered while finding the information and the output can be eval
'ed by the shell to affect the desired changes. Use of the –jobid
flag implies that geco-rsrcinfo
issued the qstat
command itself (versus waiting for qstat
output on stdin
).
The prolog
and epilog
modes of operation produce additional bash commands that are used by GECO's Grid Engine prolog and epilog scripts to write resource summaries to a job's standard output file:
[frey@farber ~]$ eval "$(geco-rsrcinfo --mode=prolog --jobid=310145 30)" [PROLOG] Resource allocation summary [PROLOG] 1 core on "n085" [PROLOG] Memory limit: 3221225472 bytes [PROLOG] nVidia GPU: [PROLOG] Intel Phi:
Verbosity
Increasing the verbosity level will display additional information (informational messages, debugging messages). For example, requesting resource information for a non-existant job:
[frey@farber ~]$ geco-rsrcinfo -vvv --prolog --jobid=1 12 2015-12-22T12:44:19-0500 [22554|DEBUG]:(GECOResource.c:221) executing for task 12 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 1", "r")... ERROR: job 1.12 is not known to the qmaster [frey@farber ~]$ echo $? 2
Doing likewise but for a non-existant task that is part of an extant job:
[frey@farber ~]$ geco-rsrcinfo -vvv --prolog --jobid=310145 128 2015-12-22T12:55:17-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")... 2015-12-22T12:55:17-0500 [40737|WARN ]:GECOResourceSetCreate: qstat returned inadequate job information for 310145.128 (reason = 8); sleeping then retrying 2015-12-22T12:55:18-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")... 2015-12-22T12:55:18-0500 [40737|WARN ]:GECOResourceSetCreate: qstat returned inadequate job information for 310145.128 (reason = 8); sleeping then retrying 2015-12-22T12:55:20-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")... ERROR: resource information not available for job 310145.128; it either does not exist or is not running [frey@farber ~]$ echo $? 22
Serializing/Unserializing Information
GECO's resource information serialization utilizes a fixed, typed textual output format that is very easily parsed (to keep things simple and fast).
[frey@farber ~]$ geco-rsrcinfo --jobid=310145 --serialize=./310145.jobdata 30 [frey@farber ~]$ echo $? 0 [frey@farber ~]$ cat 310145.jobdata GECOResourceSet_v1{li310145,li30,lf0.000000,b0,lf0.000000,i0,i1,b1,b0,s8:Disc7_5.,s7:calhoun,s8:hmichael,s10:/home/1429,s4:n085{b0,i1,lf3221225472.000000,lf0.000000,s0:,s0:}}
Rather than querying via qstat
, the job resource information can later be reconstituted in the context of GECO:
[frey@farber ~]$ geco-rsrcinfo --unserialize=./310145.jobdata --epilog SGE_EPILOG_HOSTS=(); SGE_EPILOG_NSLOTS=(); SGE_EPILOG_MEM=(); SGE_EPILOG_VMEM=(); SGE_EPILOG_GPU=(); SGE_EPILOG_PHI=(); SGE_EPILOG_HOSTS[0]='n085'; SGE_EPILOG_NSLOTS[0]=1; SGE_EPILOG_MEM[0]=3221225472; SGE_EPILOG_VMEM[0]=0; SGE_EPILOG_GPU[0]=''; SGE_EPILOG_PHI[0]=''; SGE_EPILOG_JOB_MAXRT=0; SGE_EPILOG_JOB_IS_STANDBY=0; SGE_EPILOG_JOB_VMEM=0; SGE_EPILOG_JOB_TRACELEVEL=0; SGE_EPILOG_JOB_CONFIG_PHI_FOR_USER=0;