Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
abstract:farber:runjobs:runjobs [2017-12-05 09:20] – sraskar | abstract:farber:runjobs:runjobs [2018-08-09 15:01] (current) – [What is a Job?] anita | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | < | + | ====== Running applications |
- | ====== Running applications ====== | + | |
- | //This section uses the wiki's [[00_conventions|documentation conventions]].// | + | ====== Introduction |
- | ===== Introduction ===== | + | |
- | The Grid Engine job scheduling system is used to manage and control the computing | + | The Grid Engine job scheduling system is used to manage and control the resources |
- | [[: | + | Without a job scheduler, a cluster user would need to manually search for the resources required by his or her job, perhaps by randomly logging-in to nodes and checking for other users' programs already executing thereon. |
- | In order to schedule any job (interactively or batch) on a cluster, you must set your [[general/userguide/04_compute_environ?&# | + | An outdated but still mostly relevant description of Grid Engine and job scheduling can be found in the first chapter of the [[http://docs.oracle.com/ |
- | See [[general:/ | + | ===== What is a Job? ===== |
- | ===== Runtime environment ===== | + | In this context, a //job// consists of: |
- | Generally, your runtime environment (path, environment variables, etc.) should be the same as your compile-time environment. Usually, the best way to achieve this is to put the relevant VALET commands in shell scripts. You can reuse common sets of commands | + | * a sequence |
+ | * a list of resource requirements and other properties affecting scheduling of the job | ||
+ | * a set of environment variables | ||
- | <note important> | + | For an // |
- | If you are writing | + | |
- | <code bash> | + | |
- | source / | + | |
- | </ | + | |
- | You do not need this command when you | + | |
- | - type commands, or source | + | |
- | - include lines in the file to be submitted | + | |
- | </ | + | |
- | ===== Job scheduling system ===== | + | |
- | A job scheduling system is used to manage | + | By comparison, a // |
- | Each investing-entity' | + | * a //job script// can be reused |
+ | * when resources are granted to the job it will execute immediately (day or night), yielding increased job throughput | ||
- | The standby queues are available for projects requiring more slots than purchased, or to take advantage of idle nodes when a job would have to wait in the owner queue. | + | An individual' |
- | A spillover queue may be available for the case where a job is submitted to the owner queue, and there are standby jobs consuming needed slots. Instead of waiting, the jobs will be sent to the spillover queue to start on a similar idle node. | + | ===== Queues ===== |
- | A spare queue may be on a cluster to make spare nodes available | + | At its most basic, a //queue// represents |
- | Each cluster | + | < |
+ | When submitting a job to Grid Engine, a user can explicitly specify which queue to use: doing so will place that queue' | ||
+ | ===== Job scheduling system ===== | ||
+ | A job scheduling system is used to manage and control the computing resources for all jobs submitted to a cluster. This includes load balancing, limiting resources, reconciling requests for memory and processor cores with availability of those resources, suspending and restarting jobs, and managing jobs with different priorities. | ||
- | ==== Array jobs ==== | + | Each investing-entity' |
- | An [[: | + | The standby queues are available for projects requiring more slots than purchased, or to take advantage of idle nodes when a job would have to wait in the owner queue. Other workgroup nodes will be used, so standby jobs have a time limit, and users are limited |
- | <note tip> | + | A spillover queue may be available for the case where a job is submitted |
- | The '' | + | |
- | + | ||
- | For example, the '' | + | |
- | </ | + | |
- | The general form of the **qsub** option is: | ||
- | -t // | ||
- | with a default step_size of 1. For these examples, the option would be: | ||
- | -t 2-5000: | + | ==== Grid Engine ==== |
- | Additional simple how-to examples | + | The Grid Engine job scheduling system is used to manage and control the computing resources |
- | ==== Chaining jobs ==== | + | In order to schedule any job (interactively or batch) on a cluster, you must set your [[abstract/ |
- | If you have a multiple jobs where you want to automatically run other job(s) after the execution of another job, then you can use chaining. When you chain jobs, remember to check the status of the other job to determine if it successfully completed. This will prevent the system from flooding the scheduler | + | See [[abstract/ |
- | <code - doThing1.qs> | + | ===== Runtime environment ===== |
- | #$ -N doThing1 | + | Generally, |
- | # | + | |
- | # If you want an email message to be sent to you when your job ultimately | + | |
- | # finishes, edit the -M line to have your email address and change the | + | |
- | # next two lines to start with #$ instead of just # | + | |
- | # -m eas | + | |
- | # -M my_address@mail.server.com | + | |
- | # | + | |
- | # Setup the environment; | + | |
- | # line: | + | |
- | # Now append all of your shell commands necessary | + | <note important> |
- | # after this line: | + | If you are writing an executable script that does not have the **-l** option on the **bash** command, and you want to include VALET commands in your script, then you should include the line: |
- | ./dotask1 | + | <code bash> |
+ | source / | ||
</ | </ | ||
+ | You do not need this command when you | ||
+ | - type commands, or source the command file, | ||
+ | - include lines in the file to be submitted to the qsub. | ||
+ | </ | ||
- | <code - doThing2.qs> | ||
- | #$ -N doThing2 | ||
- | #$ -hold_jid doThing1 | ||
- | # | ||
- | # If you want an email message to be sent to you when your job ultimately | ||
- | # finishes, edit the -M line to have your email address and change the | ||
- | # next two lines to start with #$ instead of just # | ||
- | # -m eas | ||
- | # -M my_address@mail.server.com | ||
- | # | ||
- | # Setup the environment; | ||
- | # line: | ||
- | # Now append all of your shell commands necessary to run your program | + | ===== Getting Help ===== |
- | # after this line: | + | |
- | # Here is where you should add a test to make sure | + | Grid Engine includes man pages for all of the commands |
- | # that dotask1 successfully completed before running | + | |
- | # ./dotask2 | + | |
- | # You might check if a specific file(s) exists that you would | + | |
- | # expect after a successful dotask1 run, something like this | + | |
- | # if [ -e dotask1.log ] | + | |
- | # then ./dotask2 | + | |
- | # fi | + | |
- | # If dotask1.log does not exist it will do nothing. | + | |
- | # If you don't need a test, then you would run the task. | + | |
- | | + | |
- | </ | + | |
- | < | + | < |
- | + | [traine@farber ~]$ man qstat | |
- | #$ -N doThing3 | + | |
- | #$ -hold_jid doThing2 | + | |
- | # | + | |
- | # If you want an email message to be sent to you when your job ultimately | + | |
- | # finishes, edit the -M line to have your email address and change the | + | |
- | # next two lines to start with #$ instead of just # | + | |
- | # -m eas | + | |
- | # -M my_address@mail.server.com | + | |
- | # | + | |
- | # Setup the environment; | + | |
- | # line: | + | |
- | + | ||
- | # Now append all of your shell commands necessary to run your program | + | |
- | # after this line: | + | |
- | # Here is where you should add a test to make sure | + | |
- | # that dotask2 successfully completed before running | + | |
- | # ./dotask3 | + | |
- | # You might check if a specific file(s) exists that you would | + | |
- | # expect after a successful dotask2 run, something like this | + | |
- | # if [ -e dotask2.log | + | |
- | # then ./dotask3 | + | |
- | # fi | + | |
- | # If dotask2.log does not exist it will do nothing. | + | |
- | # If you don't need a test, then just run the task. | + | |
- | | + | |
</ | </ | ||
- | Now submit all three job scripts. In this example, we are using account | + | to learn more about a Grid Engine command (in this case, '' |
- | < | + | < |
- | [(it_css:traine)@mills ~]$ qsub doThing1.qs | + | [traine@farber |
- | [(it_css: | + | usage: qstat [options] |
- | [(it_css:traine)@mills ~]$ qsub doThing3.qs | + | [-cb] view additional binding specific parameters |
+ | [-ext] | ||
+ | : | ||
</ | </ | ||
- | The basic flow is '' | + | //This section uses the wiki's [[http://docs-dev.hpc.udel.edu/doku.php#documentation-conventions|documentation conventions]].// |
- | + | ||
- | You might also want to have '' | + | |
- | + | ||
- | <code - doThing2.qs> | + | |
- | + | ||
- | #$ -N doThing2 | + | |
- | # | + | |
- | # If you want an email message to be sent to you when your job ultimately | + | |
- | # finishes, edit the -M line to have your email address and change the | + | |
- | # next two lines to start with #$ instead of just # | + | |
- | # -m eas | + | |
- | # -M my_address@mail.server.com | + | |
- | # | + | |
- | # Setup the environment; | + | |
- | # line: | + | |
- | + | ||
- | # Now append all of your shell commands necessary to run your program | + | |
- | # after this line: | + | |
- | ./dotask2 | + | |
- | </code> | + | |
- | + | ||
- | < | + | |
- | + | ||
- | #$ -N doThing3 | + | |
- | #$ -hold_jid doThing1, | + | |
- | # | + | |
- | # If you want an email message to be sent to you when your job ultimately | + | |
- | # finishes, edit the -M line to have your email address and change the | + | |
- | # next two lines to start with #$ instead of just # | + | |
- | # -m eas | + | |
- | # -M my_address@mail.server.com | + | |
- | # | + | |
- | # Setup the environment; | + | |
- | # line: | + | |
- | + | ||
- | # Now append all of your shell commands necessary to run your program | + | |
- | # after this line: | + | |
- | # Here is where you should add a test to make sure | + | |
- | # that dotask1 and dotask2 successfully completed before running | + | |
- | # ./dotask3 | + | |
- | # You might check if a specific file(s) exists that you would | + | |
- | # expect after a successful dotask1 and dotask2 run, something like this | + | |
- | # if [ -e dotask1.log -a -e dotask2.log | + | |
- | # then ./dotask3 | + | |
- | # fi | + | |
- | # If both files do not exist it will do nothing. | + | |
- | # If you don't need a test, then just run the task. | + | |
- | ./dotask3 | + | |
- | </ | + | |
- | + | ||
- | Now submit all three jobs again. However this time '' | + | |
- | before running. | + | |