Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
abstract:farber:runjobs:schedule_jobs [2018-10-08 16:07] – [Interactive jobs (qlogin)] anita | abstract:farber:runjobs:schedule_jobs [2019-02-22 11:36] – [Memory] ssunkara | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== Scheduling Jobs on Farber ===== | + | ====== Scheduling Jobs on Farber |
In order to schedule any job (interactively or batch) on a cluster, you must set your **[[/ | In order to schedule any job (interactively or batch) on a cluster, you must set your **[[/ | ||
- | ==== Interactive jobs (qlogin) ==== | + | ===== Interactive jobs (qlogin) |
As discussed, an // | As discussed, an // | ||
Line 56: | Line 56: | ||
Use the login (head) node for interactive program development including Fortran, C, and C++ program compilation. Use Grid Engine (**qlogin**) to start interactive shells on your workgroup // | Use the login (head) node for interactive program development including Fortran, C, and C++ program compilation. Use Grid Engine (**qlogin**) to start interactive shells on your workgroup // | ||
- | ===== Submitting an Interactive Job ===== | + | ==== Submitting an Interactive Job ==== |
In Grid Engine, interactive jobs are submitted to the job scheduler using the '' | In Grid Engine, interactive jobs are submitted to the job scheduler using the '' | ||
Line 109: | Line 109: | ||
- | ====== Batch Jobs (qsub)====== | + | ===== Batch Jobs (qsub) ===== |
- | + | ||
- | Grid Engine provides the **qsub** command for scheduling batch jobs: | + | |
- | + | ||
- | ^ command ^ Action ^ | + | |
- | | '' | + | |
- | + | ||
- | For example, | + | |
- | + | ||
- | qsub myproject.qs | + | |
- | + | ||
- | or to submit a standby job that waits for idle nodes (up to 240 slots for 8 hours), | + | |
- | + | ||
- | qsub -l standby=1 myproject.qs | + | |
- | + | ||
- | or to submit a standby job that waits for idle 48-core nodes (if you are using a cluster with 48-core nodes like farber) | + | |
- | + | ||
- | qsub -l standby=1 -q standby.q@@48core myproject.qs | + | |
- | + | ||
- | or to submit a standby job that waits for idle 24-core nodes, (would not be assigned to any 48-core nodes; important for consistency of core assignment) | + | |
- | + | ||
- | qsub -l standby=1 -q standby.q@@24core myproject.qs | + | |
- | + | ||
- | or to submit to the four hour standby queue (up to 816 slots spanning all nodes) | + | |
- | + | ||
- | qsub -l standby=1, | + | |
- | + | ||
- | or to submit to the four hour standby queue spanning just the 24-core nodes. | + | |
- | + | ||
- | qsub -l standby=1, | + | |
- | + | ||
- | This file '' | + | |
- | + | ||
- | <note tip> | + | |
- | We strongly recommend that you use a script file that you pattern after the prototypes in **/ | + | |
- | + | ||
- | Reusable job scripts help you maintain a consistent batch environment across runs. The optional **.qs** filename suffix signifies a **q**ueue-**s**ubmission script file. | + | |
- | </ | + | |
- | + | ||
- | <note important> | + | |
- | + | ||
- | + | ||
- | + | ||
- | === Grid Engine environment variables === | + | |
- | + | ||
- | In every batch session, Grid Engine sets environment variables that are useful within job scripts. Here are some common examples. The rest appear in the ENVIRONMENTAL VARIABLES section of the **qsub**** man** page.//// | + | |
- | + | ||
- | ^ Environment variable ^ Contains ^ | + | |
- | | **HOSTNAME** | Name of the execution (compute) node | | + | |
- | | **JOB_ID** | Batch job id assigned by Grid Engine | | + | |
- | | **JOB_NAME** | Name you assigned to the batch job (See [[# | + | |
- | | **NSLOTS** | Number of // | + | |
- | | **SGE_TASK_ID** | Task id of an array job sub-task (See [[# | + | |
- | | **TMPDIR** | Name of directory on the (compute) node scratch filesystem | | + | |
- | + | ||
- | When Grid Engine assigns one of your job's tasks to a particular node, it creates a temporary work directory on that node's 1-2 TB local scratch disk. And when the task assigned to that node is finished, Grid Engine removes the directory and its contents. The form of the directory name is | + | |
- | + | ||
- | **/ | + | |
- | + | ||
- | For example after '' | + | |
- | <code bash> | + | |
- | echo $TMPDIR | + | |
- | </ | + | |
- | to see the name of the node scratch directory for this interactive job. | + | |
- | < | + | |
- | / | + | |
- | </ | + | |
- | + | ||
- | See [[: | + | |
- | + | ||
- | Grid Engine uses these environment variables' | + | |
- | + | ||
- | ^ File name patter ^ Description ^ | + | |
- | | [$JOB_NAME].o[$JOB_ID] | Default **output** filename | | + | |
- | | [$JOB_NAME].e[$JOB_ID] | **error** filename (when not joined to output) | | + | |
- | | [$JOB_NAME].po[$JOB_ID] | Parallel job **output** output (Empty for most queues) | | + | |
- | | [$JOB_NAME].pe[$JOB_ID] | Parallel job **error** filename (Usually empty) | | + | |
- | + | ||
- | + | ||
- | === Command options for qsub === | + | |
- | + | ||
- | The most commonly used **qsub** options fall into two categories: // | + | |
- | + | ||
- | The table below lists **qsub**' | + | |
- | + | ||
- | ^ Option / Argument ^ Function ^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | ^ Special notes for IT clusters: ^^ | + | |
- | | '' | + | |
- | | '' | + | |
- | | '' | + | |
- | ^ The resource-management options for '' | + | |
- | | '' | + | |
- | | '' | + | |
- | + | ||
- | For example, putting the lines | + | |
- | < | + | |
- | #$ -l h_cpu=1: | + | |
- | #$ –pe threads 12 | + | |
- | </ | + | |
- | in the job script tells Grid Engine to set a hard limit of 1.5 hours on the CPU time resource for the job, and to assign 12 processors for your job. | + | |
- | + | ||
- | Grid Engine tries to satisfy all of the resource-management options you specify in a job script or as qsub command-line options. If there is a queue already defined that accepts jobs having that particular combination of requests, Grid Engine assigns your job to that queue. | + | |
- | + | ||
- | + | ||
- | + | ||
- | ====== Batch Jobs ====== | + | |
Prerequisite to the submission of //batch jobs// to the job scheduler is the writing of a //job script// | Prerequisite to the submission of //batch jobs// to the job scheduler is the writing of a //job script// | ||
Line 246: | Line 133: | ||
</ | </ | ||
- | ===== Submitting | + | ==== Submitting |
- | Batch jobs are submitted to the job scheduler using the '' | + | Grid Engine provides |
+ | |||
+ | ^ command ^ Action ^ | ||
+ | | '' | ||
+ | |||
+ | For example, | ||
<code bash> | <code bash> | ||
Line 326: | Line 218: | ||
where the argument to the '' | where the argument to the '' | ||
- | ===== Job Output | + | ==== Job Output ==== |
Equally as important as executing the job is capturing any output produced by the job. As mentioned above, the '' | Equally as important as executing the job is capturing any output produced by the job. As mentioned above, the '' | ||
Line 353: | Line 245: | ||
< | < | ||
- | ===== Forgetting the Filename | + | ==== Forgetting the Filename ==== |
A user may mistakenly omit the script filename from the '' | A user may mistakenly omit the script filename from the '' | ||
Line 377: | Line 269: | ||
The "'' | The "'' | ||
+ | ===== More details about using qsub ===== | ||
+ | For example, | ||
+ | qsub myproject.qs | ||
+ | |||
+ | or to submit a standby job that waits for idle nodes (up to 240 slots for 8 hours), | ||
+ | |||
+ | qsub -l standby=1 myproject.qs | ||
+ | |||
+ | or to submit a standby job that waits for idle 48-core nodes (if you are using a cluster with 48-core nodes like farber) | ||
+ | |||
+ | qsub -l standby=1 -q standby.q@@48core myproject.qs | ||
+ | |||
+ | or to submit a standby job that waits for idle 24-core nodes, (would not be assigned to any 48-core nodes; important for consistency of core assignment) | ||
+ | |||
+ | qsub -l standby=1 -q standby.q@@24core myproject.qs | ||
+ | |||
+ | or to submit to the four hour standby queue (up to 816 slots spanning all nodes) | ||
+ | |||
+ | qsub -l standby=1, | ||
+ | |||
+ | or to submit to the four hour standby queue spanning just the 24-core nodes. | ||
+ | |||
+ | qsub -l standby=1, | ||
+ | |||
+ | This file '' | ||
+ | |||
+ | <note tip> | ||
+ | We strongly recommend that you use a script file that you pattern after the prototypes in **/ | ||
+ | |||
+ | Reusable job scripts help you maintain a consistent batch environment across runs. The optional **.qs** filename suffix signifies a **q**ueue-**s**ubmission script file. | ||
+ | </ | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | ==== Grid Engine environment variables ==== | ||
+ | |||
+ | In every batch session, Grid Engine sets environment variables that are useful within job scripts. Here are some common examples. The rest appear in the ENVIRONMENTAL VARIABLES section of the **qsub**** man** page.//// | ||
+ | |||
+ | ^ Environment variable ^ Contains ^ | ||
+ | | **HOSTNAME** | Name of the execution (compute) node | | ||
+ | | **JOB_ID** | Batch job id assigned by Grid Engine | | ||
+ | | **JOB_NAME** | Name you assigned to the batch job (See [[# | ||
+ | | **NSLOTS** | Number of // | ||
+ | | **SGE_TASK_ID** | Task id of an array job sub-task (See [[# | ||
+ | | **TMPDIR** | Name of directory on the (compute) node scratch filesystem | | ||
+ | |||
+ | When Grid Engine assigns one of your job's tasks to a particular node, it creates a temporary work directory on that node's 1-2 TB local scratch disk. And when the task assigned to that node is finished, Grid Engine removes the directory and its contents. The form of the directory name is | ||
+ | |||
+ | **/ | ||
+ | |||
+ | For example after '' | ||
+ | <code bash> | ||
+ | echo $TMPDIR | ||
+ | </ | ||
+ | to see the name of the node scratch directory for this interactive job. | ||
+ | < | ||
+ | / | ||
+ | </ | ||
+ | |||
+ | See [[: | ||
+ | |||
+ | Grid Engine uses these environment variables' | ||
+ | |||
+ | ^ File name patter ^ Description ^ | ||
+ | | [$JOB_NAME].o[$JOB_ID] | Default **output** filename | | ||
+ | | [$JOB_NAME].e[$JOB_ID] | **error** filename (when not joined to output) | | ||
+ | | [$JOB_NAME].po[$JOB_ID] | Parallel job **output** output (Empty for most queues) | | ||
+ | | [$JOB_NAME].pe[$JOB_ID] | Parallel job **error** filename (Usually empty) | | ||
+ | |||
+ | ==== More options for qsub ==== | ||
+ | |||
+ | The most commonly used **qsub** options fall into two categories: // | ||
+ | |||
+ | The table below lists **qsub**' | ||
+ | |||
+ | ^ Option / Argument ^ Function ^ | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | ^ Special notes for IT clusters: ^^ | ||
+ | | '' | ||
+ | | '' | ||
+ | | '' | ||
+ | ^ The resource-management options for '' | ||
+ | | '' | ||
+ | | '' | ||
+ | |||
+ | For example, putting the lines | ||
+ | < | ||
+ | #$ -l h_cpu=1: | ||
+ | #$ –pe threads 12 | ||
+ | </ | ||
+ | in the job script tells Grid Engine to set a hard limit of 1.5 hours on the CPU time resource for the job, and to assign 12 processors for your job. | ||
+ | |||
+ | Grid Engine tries to satisfy all of the resource-management options you specify in a job script or as qsub command-line options. If there is a queue already defined that accepts jobs having that particular combination of requests, Grid Engine assigns your job to that queue. | ||
===== Resource-management options on Farber ===== | ===== Resource-management options on Farber ===== | ||
Line 396: | Line 388: | ||
For memory you will be concerned about how much is free. Memory resources come as both consumable and sensor driven (not consumable). | For memory you will be concerned about how much is free. Memory resources come as both consumable and sensor driven (not consumable). | ||
^ memory resource ^ Consumable ^ Explanation ^ | ^ memory resource ^ Consumable ^ Explanation ^ | ||
- | | mem_free | No |Memory that must be available BEFORE job can start | | + | | m_mem_free | Yes |Memory consumed |
- | | m_mem_free | Yes |Memory consumed | + | |
- | It is usually a good idea to add both resources. The '' | + | The '' |
- | <note tip>When using a shared memory parallel computing environment '' | + | <note tip>When using a shared memory parallel computing environment '' |
<note warning> | <note warning> | ||
Line 414: | Line 405: | ||
< | < | ||
- | qsub -l mem_free=20G, | + | qsub -l m_mem_free=20G -t 1-30 myjob.qs |
</ | </ | ||
- | This will submit 30 jobs to the queue, with the SGE_TASK_ID variable set for use in the '' | + | This will submit 30 jobs to the queue, with the SGE_TASK_ID variable set for use in the '' |
- | The '' | + | The '' |
- | The '' | + | |
==== Parallel environments ==== | ==== Parallel environments ==== | ||
Line 441: | Line 431: | ||
=== The threads parallel environment === | === The threads parallel environment === | ||
- | Jobs such as those having openMP directives use the **// | + | Jobs such as those having openMP directives use the **// |
For example, if your group only owns nodes with 24 cores, then your '' | For example, if your group only owns nodes with 24 cores, then your '' | ||
Line 548: | Line 538: | ||
==== Array jobs ==== | ==== Array jobs ==== | ||
- | An [[: | + | An array job essentially runs the same job by generating a new repeated task many times. Each time, the environment variable **SGE_TASK_ID** is set to a sequence number by Grid Engine and its value provides input to the job submission script. |
<note tip> | <note tip> |