====== The geco-rsrcinfo Command ====== The ''geco-rsrcinfo'' command makes use of the job resource lookup functionality in the GECO core library to: * parse the output of ''qstat -xml -j #.#'' (either provided on ''stdin'' or executed by ''geco-rsrcinfo'' itself) * output a chunk of bash scripting that when executed adds job resource information to the environment * serialize job resource information to a file * unserialize job resource information from a file The command is primarily meant to be used by GECO's Grid Engine prolog and epilog scripts, but may also have some value within the context of job scripts. usage: geco-rsrcinfo {options} [task-id] options: -h/--help show this information -v/--verbose increase the verbosity level (may be used multiple times) -q/--quiet decrease the verbosity level (may be used multiple times) -m/--mode=[mode] operate in the given mode: prolog: SGE prolog script epilog: SGE epilog script userenv: user environment -p/--prolog shorthand for --mode=prolog -e/--epilog shorthand for --mode=epilog -o/--only return information for the native host only, not an array of hosts -H/--host=[hostname] return information for the specified host only, not an array of hosts -j/--jobid=[job_id] request info for a specific job id (without this option, qstat output is expected on stdin) -s/--serialize=[path] rather than displaying to stdout, serialize the resource information to the given filepath -u/--unserialize=[path] unserialize resource information in the given filepath and display it -r/--qstat-retry=# if qstat fails to return data for a job, retry this many times The ''[task-id]'' is in reference to array jobs. If no ''[task-id]'' is provided it defaults to 1 (the implicit task id for non-array jobs). Since the output of a ''qstat'' command is necessary, any node on which ''geco-rsrcinfo'' will be run must be allowed to query the qmaster and must have the ''qstat'' command available to it. ===== Examples ===== Output bash code to setup the "user environment" environment variables to contain all resource information for task 30 of job 310145: [frey@farber ~]$ geco-rsrcinfo --mode=userenv --jobid=310145 30 SGE_RESOURCE_HOSTS=(); SGE_RESOURCE_NSLOTS=(); SGE_RESOURCE_MEM=(); SGE_RESOURCE_VMEM=(); SGE_RESOURCE_GPU=(); SGE_RESOURCE_PHI=(); SGE_RESOURCE_HOSTS[0]='n085'; SGE_RESOURCE_NSLOTS[0]=1; SGE_RESOURCE_MEM[0]=3221225472; SGE_RESOURCE_VMEM[0]=0; SGE_RESOURCE_GPU[0]=''; SGE_RESOURCE_PHI[0]=''; SGE_RESOURCE_JOB_MAXRT=0; SGE_RESOURCE_JOB_IS_STANDBY=0; SGE_RESOURCE_JOB_VMEM=0; SGE_RESOURCE_JOB_TRACELEVEL=0; SGE_RESOURCE_JOB_CONFIG_PHI_FOR_USER=0; [frey@farber ~]$ echo $? 0 Since ''geco-rsrcinfo'' returns zero, no issues were encountered while finding the information and the output can be ''eval'''ed by the shell to affect the desired changes. Use of the ''--jobid'' flag implies that ''geco-rsrcinfo'' issued the ''qstat'' command itself (versus waiting for ''qstat'' output on ''stdin''). The ''prolog'' and ''epilog'' modes of operation produce additional bash commands that are used by GECO's Grid Engine prolog and epilog scripts to write resource summaries to a job's standard output file: [frey@farber ~]$ eval "$(geco-rsrcinfo --mode=prolog --jobid=310145 30)" [PROLOG] Resource allocation summary [PROLOG] 1 core on "n085" [PROLOG] Memory limit: 3221225472 bytes [PROLOG] nVidia GPU: [PROLOG] Intel Phi: ==== Verbosity ==== Increasing the verbosity level will display additional information (informational messages, debugging messages). For example, requesting resource information for a non-existant job: [frey@farber ~]$ geco-rsrcinfo -vvv --prolog --jobid=1 12 2015-12-22T12:44:19-0500 [22554|DEBUG]:(GECOResource.c:221) executing for task 12 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 1", "r")... ERROR: job 1.12 is not known to the qmaster [frey@farber ~]$ echo $? 2 Doing likewise but for a non-existant task that is part of an extant job: [frey@farber ~]$ geco-rsrcinfo -vvv --prolog --jobid=310145 128 2015-12-22T12:55:17-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")... 2015-12-22T12:55:17-0500 [40737|WARN ]:GECOResourceSetCreate: qstat returned inadequate job information for 310145.128 (reason = 8); sleeping then retrying 2015-12-22T12:55:18-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")... 2015-12-22T12:55:18-0500 [40737|WARN ]:GECOResourceSetCreate: qstat returned inadequate job information for 310145.128 (reason = 8); sleeping then retrying 2015-12-22T12:55:20-0500 [40737|DEBUG]:(GECOResource.c:221) executing for task 128 popen("/opt/shared/univa/current/bin/lx-amd64/qstat -xml -j 310145", "r")... ERROR: resource information not available for job 310145.128; it either does not exist or is not running [frey@farber ~]$ echo $? 22 ==== Serializing/Unserializing Information ==== GECO's resource information serialization utilizes a fixed, typed textual output format that is very easily parsed (to keep things simple and fast). [frey@farber ~]$ geco-rsrcinfo --jobid=310145 --serialize=./310145.jobdata 30 [frey@farber ~]$ echo $? 0 [frey@farber ~]$ cat 310145.jobdata GECOResourceSet_v1{li310145,li30,lf0.000000,b0,lf0.000000,i0,i1,b1,b0,s8:Disc7_5.,s7:calhoun,s8:hmichael,s10:/home/1429,s4:n085{b0,i1,lf3221225472.000000,lf0.000000,s0:,s0:}} Rather than querying via ''qstat'', the job resource information can later be reconstituted in the context of GECO: [frey@farber ~]$ geco-rsrcinfo --unserialize=./310145.jobdata --epilog SGE_EPILOG_HOSTS=(); SGE_EPILOG_NSLOTS=(); SGE_EPILOG_MEM=(); SGE_EPILOG_VMEM=(); SGE_EPILOG_GPU=(); SGE_EPILOG_PHI=(); SGE_EPILOG_HOSTS[0]='n085'; SGE_EPILOG_NSLOTS[0]=1; SGE_EPILOG_MEM[0]=3221225472; SGE_EPILOG_VMEM[0]=0; SGE_EPILOG_GPU[0]=''; SGE_EPILOG_PHI[0]=''; SGE_EPILOG_JOB_MAXRT=0; SGE_EPILOG_JOB_IS_STANDBY=0; SGE_EPILOG_JOB_VMEM=0; SGE_EPILOG_JOB_TRACELEVEL=0; SGE_EPILOG_JOB_CONFIG_PHI_FOR_USER=0;