In order to schedule any job (interactively or batch) on a cluster, you must set your workgroup to define your cluster group or investing-entity compute nodes.
As discussed, an interactive job allows a user to enter a sequence of commands manually. The following qualify as being interactive jobs:
As far as the final bullet point goes, suppose a user has a long-running batch job and must later extract results from its output using a single command that will execute for a short time (say five minutes). While the user could go to the effort of creating a batch job, it may be easier to just run the command interactively and visually note its output.
All interactive jobs should be scheduled to run on the compute nodes, not the login/head node.
An interactive session (job) can often be made non-interactive (interactive job) by putting the input in a file, using the redirection symbols < and >, and making the entire command a line in a job script file:
program_name < input_command_file > output_command_file
Then the non-interactive (batch job) job can be scheduled as a batch job.
Remember you must specify your workgroup to define your cluster group or investing-entity compute nodes before submitting any job, and this includes starting an interactive session. Now use the Grid Engine command qlogin on the login (head) node. Grid Engine will look for a node with a free scheduling slot (processor core) and a sufficiently light load, and then assign your session to it. If no such node becomes available, your qlogin request will eventually time out. The qlogin command results in a job in the workgroup interactive serial queue, <investing_entity>-qrsh.q.
Type
workgroup -g //investing-entity//
Type
qlogin
to reserve one scheduling slot and start an interactive shell on one of your workgroup investing-entity compute nodes.
Type
qlogin –pe threads 12
to reserve 12 scheduling slots and start an interactive shell on one your workgroup investing-entity compute node.
Type
exit
to terminate the interactive shell and release the scheduling slot(s).
Use the login (head) node for interactive program development including Fortran, C, and C++ program compilation. Use Grid Engine (qlogin) to start interactive shells on your workgroup investing-entity compute nodes.
In Grid Engine, interactive jobs are submitted to the job scheduler using the qlogin
command:
[(it_css:traine)@farber it_css]$ qlogin Your job 78731 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ... Your interactive job 78731 has been successfully scheduled. Establishing /opt/shared/GridEngine/local/qlogin_ssh session to host n013 ... [traine@n013 it_css]$
Dissecting this text, we see that:
n013
n013
and waiting for commands to be typedWhat is not apparent from the text:
n013
has as its working directory the directory in which the qlogin
command was typed (it_css
)waiting for interactive job to be scheduled …
" and later resumed with the message about its being successfully scheduled-now
command-line flag:[(it_css:traine)@farber it_css]$ qlogin -now y Your job 78735 ("QLOGIN") has been submitted waiting for interactive job to be scheduled ...timeout (4 s) expired while waiting on socket fd 4 Your "qlogin" request could not be scheduled, try again later. [(it_css:traine)@farber it_css]$
By default an interactive job submitting using qlogin
is given a name of "QLOGIN." This can get confusing if a user has many interactive jobs submitted at one time. Taking a moment to name each interactive job according to its purpose may save the user a lot of effort later:
[(it_css:traine)@farber it_css]$ qlogin -N 'Matlab graphs' Your job 78737 ("Matlab graphs") has been submitted waiting for interactive job to be scheduled ... Your interactive job 78737 has been successfully scheduled. Establishing /opt/shared/GridEngine/local/qlogin_ssh session to host n013 ... [traine@n013 it_css]$
The name provided with the -N
command-line option will appear in job status listings (see the next section).
Prerequisite to the submission of batch jobs to the job scheduler is the writing of a job script. Grid Engine job scripts follow the same form as shell scripts, with a few exceptions:
#$
" act like command-line options when the job is submittedThe first two points actually keep a job script "safer" because the script cannot be mistakenly executed on the head node. When a batch job is submitted Grid Engine makes a copy of the script and removes the executable bit for exactly this reason.
The simplest possible job script looks something like this:
echo "Hello, world."
As Grid Engine is configured on UD clusters, this job script would be executed within a BASH shell. To use a different shell, the -S
command-line option can be embedded in the job script:
#$ -S /bin/tcsh echo "Hello, world."
Grid Engine provides the qsub command for scheduling batch jobs:
command | Action |
---|---|
qsub «command_line_options» «job_script» | Submit job with script command in the file «job_script» |
For example,
[(it_css:traine)@farber it_css]$ qsub job_script_01.qs Your job 78742 ("job_script_01.qs") has been submitted
Notice that the job name defaults to being the name of the job script; as discussed in the previous section, a job name can also be explicitly provided
#$ -N testing002 echo "Hello, world."
which when submitted would yield
[(it_css:traine)@farber it_css]$ qsub job_script_02.qs Your job 78745 ("testing002") has been submitted
It has already been demonstrated that command-line options to the qsub
command can be embedded in a job script. Likewise, the options can be specified on the command line. For example:
[(it_css:traine)@farber it_css]$ qsub -N 'testingtoo' job_script_02.qs Your job 78748 ("testingtoo") has been submitted
The -N
option was provided in the queue script and on the command line itself: Grid Engine will honor options from the command line in preference to those embedded in the script. Thus, in this case the "testingtoo
" provided on the command line overrode the "testing002
" from the job script.
The qsub
command has many options available, all of which are documented in its man page. A few of the often-used options will be discussed here.
There are several default options that are automatically added to every qsub
by Grid Engine:
Option | Discussion |
---|---|
-j y | Regular (stdout) and error (stderr) output emitted by the job script should go to a single file |
-cwd | When the job executes, its working directory should be the working directory at the time of job submission |
-w w | Grid Engine checks submitted jobs to ensure that at least one queue will accept them; this option indicates that jobs with no valid queue produce a warning and remain queued |
There are default resource requirements supplied, as well, but they are beyond the scope of this section. Providing an alternate value for any of these arguments – in the job script or on the qsub
command line – overrides the default value.
Since batch jobs can run unattended, the user may want to be notified of status changes for a job: when the job begins executing; when the job finishes; or if the job was killed. Grid Engine will deliver such notifications (as emails) to a job's owner if the owner requests them using the -m
option. This option has a single argument, consisting of letters indicating the state changes for which notifications should be delivered:
Letter | State Change |
---|---|
b | The job has started executing |
e | The job has completed execution without error |
a | The job aborted or was rescheduled |
s | The job was suspended |
To receive notification when the job is finished – successfully or in error – the user would specify -m ea
either on the command line or in the job script. The user should supply the target email address for these notifications using the -M
option:
#$ -N 'Sample job' #$ -m ea #$ -M traine@gmail.com echo "Hello, world."
Some jobs may only be eligible for execution after a certain date and time have passed. While a user could wait until that time has arrived to submit the job, Grid Engine also allows a job to be submitted with a requested start time. Grid Engine will do its best to meet that date and time. For example, on September 14 a user arranges with an external agent to copy a weather data file to the cluster around 6:00 p.m. on September 20. The user wishes to process the data (allowing 30 minutes for the file transfer to complete) as soon as possible. On September 14, the user could submit a batch job to be executed in the future:
[(it_css:traine)@farber it_css]$ qsub -a 201209201830 process_weather.qs Your job 78758 ("process_weather.qs") has been submitted
where the argument to the -a
option is in the form YYYYMMDDHHmm
(year, month, day, hour, minute).
Equally as important as executing the job is capturing any output produced by the job. As mentioned above, the -j y
option sends all output (stdout and stderr) to a single file. By default, that output file is named according to the formula
[job name].o[job id]
For the weather-processing example above, the output would be found in
[(it_css:traine)@farber it_css]$ qsub -a 201209201830 process_weather.qs Your job 78758 ("process_weather.qs") has been submitted [(it_css:traine)@farber it_css]$ # # ... some time goes by ... # [(it_css:traine)@farber it_css]$ ls *.o* process_weather.qs.o78758
.o[job id]
suffix on the output file's name.
The name of the output file can be overridden using the -o
command-line option to qsub
. The argument to this option is the name of the file, possibly containing special characters that will be replaced by the job id, job name, etc. See the qsub
man page for a complete description.
-y n
), the error output is directed to a file named as described above but with a .e[job id]
suffix. Likewise, an explicit filename can be provided using the -e
option.
A user may mistakenly omit the script filename from the qsub
command. Surprisingly, qsub
does not complain in such a situation; instead, it pauses and allows the user to type a script:
[(it_css:traine)@farber it_css]$ qsub # # Oops, I forgot to provide a job script to qsub! # echo "Oops, I did it again." ^D Your job 78774 ("STDIN") has been submitted [(it_css:traine)@farber it_css]$ # # ... some time goes by ... # [(it_css:traine)@farber it_css]$ cat STDIN.o78774 Oops, I did it again.
The "^D
" represents holding down the "control" key and pressing the "D" key; this signals "end of file" and lets qsub
know that the user is done entering lines of text. By default, a batch job submitted in this fashion will be named "STDIN
".
For example,
qsub myproject.qs
or to submit a standby job that waits for idle nodes (up to 240 slots for 8 hours),
qsub -l standby=1 myproject.qs
or to submit a standby job that waits for idle 48-core nodes (if you are using a cluster with 48-core nodes like farber)
qsub -l standby=1 -q standby.q@@48core myproject.qs
or to submit a standby job that waits for idle 24-core nodes, (would not be assigned to any 48-core nodes; important for consistency of core assignment)
qsub -l standby=1 -q standby.q@@24core myproject.qs
or to submit to the four hour standby queue (up to 816 slots spanning all nodes)
qsub -l standby=1,h_rt=4:00:00 myproject.qs
or to submit to the four hour standby queue spanning just the 24-core nodes.
qsub -l standby=1,h_rt=4:00:00 -q standby-4h.q@@24core myproject.qs
This file myproject.qs
will contain bash shell commands and qsub statements that include qsub options and resource specifications. The qsub statements begin with #$.
Reusable job scripts help you maintain a consistent batch environment across runs. The optional .qs filename suffix signifies a queue-submission script file.
In every batch session, Grid Engine sets environment variables that are useful within job scripts. Here are some common examples. The rest appear in the ENVIRONMENTAL VARIABLES section of the qsub man page.
Environment variable | Contains |
---|---|
HOSTNAME | Name of the execution (compute) node |
JOB_ID | Batch job id assigned by Grid Engine |
JOB_NAME | Name you assigned to the batch job (See Command options for qsub) |
NSLOTS | Number of scheduling slots (processor cores) assigned by Grid Engine to this job |
SGE_TASK_ID | Task id of an array job sub-task (See Array jobs) |
TMPDIR | Name of directory on the (compute) node scratch filesystem |
When Grid Engine assigns one of your job's tasks to a particular node, it creates a temporary work directory on that node's 1-2 TB local scratch disk. And when the task assigned to that node is finished, Grid Engine removes the directory and its contents. The form of the directory name is
/scratch/[$JOB_ID].[$SGE_TASK_ID].«queue_name»
For example after qlogin
type
echo $TMPDIR
to see the name of the node scratch directory for this interactive job.
/scratch/71842.1.it_css-qrsh.q
See Filesystems and Computing environment for more information about the node scratch filesystem and using environment variables.
Grid Engine uses these environment variables' values when creating the job's output files:
File name patter | Description |
---|---|
[$JOB_NAME].o[$JOB_ID] | Default output filename |
[$JOB_NAME].e[$JOB_ID] | error filename (when not joined to output) |
[$JOB_NAME].po[$JOB_ID] | Parallel job output output (Empty for most queues) |
[$JOB_NAME].pe[$JOB_ID] | Parallel job error filename (Usually empty) |
The most commonly used qsub options fall into two categories: operational and resource-management. The operational options deal with naming the output files, mail notification of the processing steps, sequencing of a series of jobs, and establishing the UNIX environment. The resource-management options deal with the specific system resources you desire or need, such as parallel programming environments, number of processor cores, maximum CPU time, and virtual memory needed.
The table below lists qsub's common operational options.
Option / Argument | Function |
---|---|
-N «job_name» | Names the job <job_name>. Default: the job script's full filename. |
-m {b|e|a|s|n} | Specifies when e-mail notifications of the job's status should be sent: beginning, end, abort, suspend. Default: never |
-M «email_address» | Specifies the email address to use for notifications. |
-j {y|n} | Joins (redirects) the STDERR results to STDOUT. Default: y(yes) |
-o «output_file» | Directs job output STDOUT to <output_file>. Default: see Grid Engine environment variables |
-e «error_file» | Directs job errors (STDERR) to <error_file>. File is only produced when the qsub option –j n is used. |
-hold_jid <job_list> | Holds job until the jobs named in <job_list> are completed. Job may be listed as a list of comma-separated numeric job ids or job names. |
-t «task_id_range» | Used for array jobs. See Array jobs for details. |
Special notes for IT clusters: | |
-cwd | Default. Uses current directory as the job's working directory. |
-V | Ignored. Generally, the login node's environment is not appropriate to pass to a compute node. Instead, you must define the environment variables directly in the job script. |
-q «queue_name» | Not need in most cases. Your choice of resource-management options determine the queue. |
The resource-management options for qsub have two common forms: |
|
-l «resource»= «value» |
|
-pe «parallel_environment» «Nproc» |
For example, putting the lines
#$ -l h_cpu=1:30:00 #$ –pe threads 12
in the job script tells Grid Engine to set a hard limit of 1.5 hours on the CPU time resource for the job, and to assign 12 processors for your job.
Grid Engine tries to satisfy all of the resource-management options you specify in a job script or as qsub command-line options. If there is a queue already defined that accepts jobs having that particular combination of requests, Grid Engine assigns your job to that queue.
You may give a resource request list in the form -l resource=value
. A list of available resources with their associated valid value specifiers can be obtained by the command:
qconf -sc
Each named complex or shortcut can be a resource
. There can be multiple, comma separated, resource=value
pairs. The valid values are determined by the type. Examples, MEMORY type could be 5G (5 GigaBytes), or a TIME type could be 1:30:00 (1 hour 30 minutes).
In a cluster as large a Farber, the two most important resources are cores (CPUs) and memory. The number of cores is called slots
. It is listed as a "requestable" and "consumable" resource. Parallel jobs, by definition, can use multiple cores. Thus, the slots
resource is handled by the parallel-environments option -pe
, and you do not need to put it in a resource list.
For memory you will be concerned about how much is free. Memory resources come as both consumable and sensor driven (not consumable). For example:
memory resource | Consumable | Explanation |
---|---|---|
m_mem_free | Yes | Memory consumed per CPU DURING execution |
The m_mem_free
is consumable, which means you are reserving the memory for future use. Other jobs, using m_mem_free
, may be barred from starting on the node. If you are specifying memory resources for a parallel environment job, the requested memory is multiplied by the slot count. By default, m_mem_free
is defined as 1GB of memory per core (slot), if not specified.
-pe threads
, divide the total memory needed by the number of slots. For example, to request 48G of shared memory for an 8 thread job, request 6G (6G per slot) i.e.,'-l m_mem_free=6G'
The correct form should be "-l m_mem_free=3G" for this example.
Consider 30 serial jobs, which each require 20 Gbytes of memory. Use the command
qsub -l m_mem_free=20G -t 1-30 myjob.qs
This will submit 30 jobs to the queue, with the SGE_TASK_ID variable set for use in the myjobs.qs
script (an array job.)
The m_mem_free
resource will tell Grid Engine to not schedule a job on a node unless the specified amount of memory i.e., 20GB per CPU is available to consume on that node. Since this is a serial job that runs on a single CPU,20GB can be termed as total memory available for the job.
The /opt/shared/templates/gridengine
directory contains basic prototype job scripts for non-interactive parallel jobs. This section describes the –pe parallel environment option that's required for MPI jobs, openMP jobs and other jobs that use the SMP (threads) programming model.
Type the command:
qconf –spl
to display a list of parallel environments available on a cluster.
The general form of the parallel environment option is:
-pe
«parallel_environment» «Nproc»
where «Nproc» is the number of processor slots (cores) requested. Just use a single number, and not a range. Grid Engine tries to locate as many free slots as it can and assigns them to that batch job. The environment variable $NSLOTS
is given that value.
The two most used parallel environments are threads and mpi.
Jobs such as those having openMP directives use the threads parallel environment, an implementation of the shared-memory programming model. These SMP jobs can only use the cores on a single node.
For example, if your group only owns nodes with 24 cores, then your –pe threads
request may only ask for 24 or fewer slots. Use Grid Engine's qconf command to determine the names and characteristics of the queues and compute nodes available to your investing-entity group on a cluster.
Threaded jobs do not necessarily complete faster when more slots are made available. Before running a series of production runs, you should experiment to determine how many slots generally perform best. Using that quantity will leave the remaining slots for others in your group to request. Remember: others can see how many slots you're using!
For openMP jobs, add the following bash command to your job script:
export OMP_NUM_THREADS=$NSLOTS
openmp.qs
available in /opt/shared/templates/gridengine/openmp
to copy and customize for your OpenMP jobs.
MPI jobs inherently generate considerable network traffic among the processor cores of a cluster's compute nodes. The processors on the compute node may be connected by two types of networks: InfiniBand and Gigabit Ethernet.
IT has developed templates to help with the openmpi parallel environments for Farber, targeting different user needs and architecture. You can copy the templates from /opt/shared/templates/gridengine/openmpi
and customize them. These templates are essentially identical with the exception of the presence or absence of certain qsub options and the values assigned to MPI_FLAGS based on using particular environment variables. In all cases, the parallel environment option must be specified:
-pe mpi
«NPROC»
where <NPROC> is the number of processor slots (cores) requested. Use a single number, not a range. Grid Engine tries to locate as many free slots as it can and assigns them to that job. The environment variable NSLOTS is given that value.
/opt/shared/templates/gridengine/openmpi
to copy and customize for your Open MPI jobs. See Open MPI on Farber for more details about these job scripts.
Using the resource option -l nvidia_gpu=1
or -l gpu=1
will schedule your job on a host with a GPU co-processor and blocks any other jobs from using it at the same time.
Using the resource option -l intel_phi=1
or -l phi=1
will schedule your job on a host with a PHI co-processor and blocks any other jobs from using it at the same time.
The interactive and batch jobs discussed thus far have all been serial in nature: they exist as a sequence of instructions executed in order on a single CPU core. Many problems solved on a computer can be solved more quickly by breaking the job into pieces that can be solved concurrently. If one worker moves a pile of bricks from point A to point B in 30 minutes, then employing a second worker to carry bricks should see the job completed in just 15 minutes. Adding a third worker should decrease the time to 10 minutes. Job parallelism likewise coordinates between multiple serial workers to finish a computation more quickly than if it had been done by a single worker. Parallelism can take many forms, the two most prevalent being threading and message passing. Popular implementations of threading and message passing are the OpenMP and MPI standards.
Sometimes a more loosely-coupled form of parallelism can be used by a job. Suppose a user has a collection of 100 files, each containing the full text of a novel. The user would like to run a program for each file that counts the number of gerunds occurring in the text. The counting program is a simple serial program, but the task can be completed more quickly by analyzing many files concurrently. This form of parallelism requires no threading or message passing, and in Grid Engine parlance is called an array job.
Grid Engine uses parallel environments to facilitate the scheduling of jobs that use parallelism. Each queue has a list of parallel environments for which it will accept jobs; any job requesting a parallel environment not listed will not run in that queue. Available parallel environments are displayed using the qconf
command:
[(it_css:traine)@farber it_css]$ qconf -spl generic-mpi mvapich2 openmpi threads
Programs that use OpenMP or some other form of thread parallelism should use the "threads" parallel environment. This environment logically limits jobs to run on a single node only, which in turn limits the maximum number of workers to be the CPU core count for a node.
The "generic-mpi" parallel environment should be used in general for jobs that make use of MPI parallelism. This parallel environment spans multiple nodes and allocates workers by "filling-up" one node before moving on to another. When a job starts in this parallel environment, an MPI "machines" file is automatically manufactured and placed in the job's temporary directory at ${TMPDIR}/machines
. This file should be copied to a job's working directory or passed directly to the mpirun
/mpiexec
command used to execute the MPI program.
mpirun
or mpiexec
will often have arguments or environment variables which can be set to indicate on which hosts the job should run or what file to consult for that list. Please consult software manuals and online support resources before contacting UD IT for help determining how to pass this information to the program.
Some MPI implementations are tightly integrated with Grid Engine and do not need a "machines" file. The "mvapich2" and "openmpi" parallel environments shown in the list above are two such examples. MPI programs compiled with these libraries should use the appropriate variant-specific MPI parallel environment.
After choosing the appropriate parallel environment for a job, the -pe
option must be supplied to the qsub
or qlogin
command. This option has two required arguments: the p.e. name and the number of workers requested:
qsub ... -pe openmpi 96 ...
qsub
, the parallel environment option can be specified inside the job script using the #$ -pe …
line format.
When a parallel job executes, the following environment variables will be set by Grid Engine:
Variable | Description |
---|---|
NSLOTS | The number of slots granted to the job. OpenMP jobs should assign the value of $NSLOTS to the OMP_NUM_THREADS environment variable, for example. |
NHOSTS | The number of hosts spanned by the job. |
Detailed information pertaining to individual kinds of parallel jobs – like setting the OMP_NUM_THREADS
environment variable to $NSLOTS
for OpenMP programs – are provided by UD IT in a collection of job template scripts on a per-cluster basis under the /opt/shared/templates
directory. For example, on farber this directory looks like:
[(it_css:traine)@farber ~]$ ls -l /opt/shared/templates total 4 drwxr-sr-x 7 frey _sgeadm 104 Jul 17 08:11 dev-projects drwxrwsr-x 3 frey _sgeadm 43 Apr 13 08:38 gaussian drwxrwsr-x 3 frey _sgeadm 38 Apr 13 08:38 generic-mpi drwxrwsr-x 3 frey _sgeadm 34 Apr 13 08:38 gromacs drwxrwsr-x 3 frey _sgeadm 35 Apr 13 08:38 mvapich2 drwxrwsr-x 3 frey _sgeadm 33 Apr 13 08:38 openmp drwxrwsr-x 3 frey _sgeadm 84 Sep 10 10:11 openmpi -rw-rw-r-- 1 frey _sgeadm 536 Apr 13 08:38 serial.qs
The directory layout is self-explanatory: script templates specific to OpenMP, Open MPI, and MVAPICH2 are in their own subdirectories; a generic MPI job script can be found in the generic-mpi
directory; a template for serial jobs is in serial.qs
. The scripts are heavily documented to aid in users' choice of appropriate templates.
An array job essentially runs the same job by generating a new repeated task many times. Each time, the environment variable SGE_TASK_ID is set to a sequence number by Grid Engine and its value provides input to the job submission script.
$SGE_TASK_ID
is the key to make the array jobs useful. Use it in your bash script, or pass it as a parameter so your program can decide how to complete the assigned task.
For example, the $SGE_TASK_ID
sequence values of 2, 4, 6, … , 5000 might be passed as an initial data value to 2500 repetitions of a simulation model. Alternatively, each iteration (task) of a job might use a different data file with filenames of data$SGE_TASK_ID
(i.e., data1, data2, data3, ', data2000).
The general form of the qsub option is:
-t start_value - stop_value : step_size
with a default step_size of 1. For these examples, the option would be:
-t 2-5000:2 and -t 1-2000
Additional simple how-to examples for array jobs.
If you have a multiple jobs where you want to automatically run other job(s) after the execution of another job, then you can use chaining. When you chain jobs, remember to check the status of the other job to determine if it successfully completed. This will prevent the system from flooding the scheduler with failed jobs. Here is a simple chaining example with three job scripts doThing1.qs
, doThing2.qs
and doThing3.qs
.
#$ -N doThing1 # # If you want an email message to be sent to you when your job ultimately # finishes, edit the -M line to have your email address and change the # next two lines to start with #$ instead of just # # -m eas # -M my_address@mail.server.com # # Setup the environment; add vpkg_require commands after this # line: # Now append all of your shell commands necessary to run your program # after this line: ./dotask1
#$ -N doThing2 #$ -hold_jid doThing1 # # If you want an email message to be sent to you when your job ultimately # finishes, edit the -M line to have your email address and change the # next two lines to start with #$ instead of just # # -m eas # -M my_address@mail.server.com # # Setup the environment; add vpkg_require commands after this # line: # Now append all of your shell commands necessary to run your program # after this line: # Here is where you should add a test to make sure # that dotask1 successfully completed before running # ./dotask2 # You might check if a specific file(s) exists that you would # expect after a successful dotask1 run, something like this # if [ -e dotask1.log ] # then ./dotask2 # fi # If dotask1.log does not exist it will do nothing. # If you don't need a test, then you would run the task. ./dotask2
#$ -N doThing3 #$ -hold_jid doThing2 # # If you want an email message to be sent to you when your job ultimately # finishes, edit the -M line to have your email address and change the # next two lines to start with #$ instead of just # # -m eas # -M my_address@mail.server.com # # Setup the environment; add vpkg_require commands after this # line: # Now append all of your shell commands necessary to run your program # after this line: # Here is where you should add a test to make sure # that dotask2 successfully completed before running # ./dotask3 # You might check if a specific file(s) exists that you would # expect after a successful dotask2 run, something like this # if [ -e dotask2.log ] # then ./dotask3 # fi # If dotask2.log does not exist it will do nothing. # If you don't need a test, then just run the task. ./dotask3
Now submit all three job scripts. In this example, we are using account traine
in workgroup it_css
on farber.
[(it_css:traine)@farber ~]$ qsub doThing1.qs [(it_css:traine)@farber ~]$ qsub doThing2.qs [(it_css:traine)@farber ~]$ qsub doThing3.qs
The basic flow is doThing2
will wait until doThing1
finishes, and doThing3
will wait until doThing2
finishes. If you test for success, then doThing2
will check to make sure that doThing1
was successful before running, and doThing3
will check to make sure that doThing2
was successful before running.
You might also want to have doThing1
and doThing2
execute at the same time, and only run doThing3
after they finish. In this case you will need to change doThing2
and doThing3
scripts and tests.
#$ -N doThing2 # # If you want an email message to be sent to you when your job ultimately # finishes, edit the -M line to have your email address and change the # next two lines to start with #$ instead of just # # -m eas # -M my_address@mail.server.com # # Setup the environment; add vpkg_require commands after this # line: # Now append all of your shell commands necessary to run your program # after this line: ./dotask2
#$ -N doThing3 #$ -hold_jid doThing1,doThing2 # # If you want an email message to be sent to you when your job ultimately # finishes, edit the -M line to have your email address and change the # next two lines to start with #$ instead of just # # -m eas # -M my_address@mail.server.com # # Setup the environment; add vpkg_require commands after this # line: # Now append all of your shell commands necessary to run your program # after this line: # Here is where you should add a test to make sure # that dotask1 and dotask2 successfully completed before running # ./dotask3 # You might check if a specific file(s) exists that you would # expect after a successful dotask1 and dotask2 run, something like this # if [ -e dotask1.log -a -e dotask2.log ]; # then ./dotask3 # fi # If both files do not exist it will do nothing. # If you don't need a test, then just run the task. ./dotask3
Now submit all three jobs again. However this time doThing1
and doThing2
will run at the same time, and only when they are both finished, will doThing3
run. doThing3
will check to make sure doThing1
and doThing2
are successful
before running.
Hearkening back to the text-processing example cited above, the analysis of each of the 100 files could be performed by submitting 100 separate jobs to Grid Engine, each modified to work on a different file. Using an array job helps to automate this task: each sub-task of the array job is assigned a unique integer identifier. Each sub-task can find its sub-task identifier in the SGE_TASK_ID
environment variable. Consider the following:
[(it_css:traine)@farber it_css]$ qsub -N array -t 1-4 -o 'array.$TASK_ID' echo "I am sub-task ${JOB_ID}.${SGE_TASK_ID}" ^D Your job-array 82709.1-4:1 ("array") has been submitted [(it_css:traine)@farber it_css]$ ...time passes... [(it_css:traine)@farber it_css]$ ls -1 array.* array.1 array.2 array.3 array.4 [(it_css:traine)@farber it_css]$ cat array.3 I am sub-task 82709.3
Four sub-tasks are executed, numbered from 1 through 4. The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index. The step size going from one index to the next defaults to one, but can be any positive integer greater than zero. A step size is appended to the sub-task range as in 2-20:2
– proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al.
There are essentially two methods for partitioning input data for array jobs. Both methods make use of the sub-task identifier in locating the input for a particular sub-task.
If the 100 novels were in files with names fitting the pattern novel_
«sub-task-id
».txt
then the analysis could be performed with the following qsub
command:
[(it_css:traine)@farber novels]$ qsub -N gerunds -o 'gerund_count.$TASK_ID' -t 1-100 # # Count gerunds in the file: # ./gerund_count "novel_${SGE_TASK_ID}.txt" ^D Your job-array 82715.1-100:1 ("gerunds") has been submitted
When complete, the job will produce 100 files named gerund_count.
«sub-task-id
» where the sub-task-id
collates the results to the input files.
An alternate method of organizing the chaos associated with large array jobs is to partition the data in directories: the sub-task identifier is not applied to the filenames, but is used to set the working directory for each sub-task:
[(it_css:traine)@farber novels]$ qsub -N gerunds -o gerund_count -t 1-100 # # Count gerunds in the file: # cd ${SGE_TASK_ID} ../gerund_count novel.txt > gerund_count ^D Your job-array 82716.1-100:1 ("gerunds") has been submitted
When complete, each directory will have a file named gerund_count
containing the output of the gerund_count
command.
The partitioning scheme can be as complex as the user desires. If the directories were not named "1" through "100" but instead used the name of the novel contained within, an index file could be created containing the directory names, one per line:
Great_Expectations Atlas_Shrugged The_Great_Gatsby :
The job submission might then look like:
[(it_css:traine)@farber novels]$ qsub -N gerunds -o gerund_count -t 1-100 # # Count gerunds in the file: # NOVEL_FOR_TASK=`sed -n ${SGE_TASK_ID}p index.txt` cd $NOVEL_FOR_TASK ../gerund_count novel.txt > gerund_count ^D Your job-array 82718.1-100:1 ("gerunds") has been submitted
The sed
command selects a single line of the index.txt
file; for sub-task 1 the first line is selected, sub-task 2 the second line, etc.