Differences

This shows you the differences between two versions of the page.

--- abstract:mills:runjobs:runjobs [2018-05-17 13:21] – sraskar
+++ abstract:mills:runjobs:runjobs [2021-04-27 16:21] (current) – external edit 127.0.0.1
@@ Line 1: / Line 1: @@
+====== Running applications on Mills ======
-====== Using Grid Engine on Mills ======
+====== Introduction ======
-===== Introduction =====
+The Grid Engine job scheduling system is used to manage and control the resources available to computational tasks.  The job scheduler considers each job's resource requests (memory, disk space, processor cores) and executes it as those resources become available.  The order in which jobs are submitted and a //scheduling priority// also dictate how soon the job will be eligible to execute.  The job scheduler may suspend (and later restart) some jobs in order to more quickly complete jobs with higher scheduling priority.
-The Grid Engine job scheduling system is used to manage and control the computing resources for all jobs submitted to a cluster. This includes load balancing, reconciling requests for memory and processor cores with availability of those resources, suspending and restarting jobs, and managing jobs with different priorities. Grid Engine is also known as Oracle/Sun Grid Engine or SGE.
+Without a job scheduler, a cluster user would need to manually search for the resources required by his or her job, perhaps by randomly logging-in to nodes and checking for other users' programs already executing thereon.  The user would have to "sign-out" the nodes he or she wishes to use in order to notify the other cluster users of resource availability((Historically, this is actually how some clusters were managed!)).  A computer will perform this kind of chore more quickly and efficiently than a human can, and with far greater sophistication.
-[[:general:jobsched:grid-engine:start|Grid Engine job scheduling system]] provides an excellent overview of Grid Engine which is the job schedule system used on Mills.
+An outdated but still mostly relevant description of Grid Engine and job scheduling can be found in the first chapter of the [[http://docs.oracle.com/cd/E19957-01/820-0699/chp1-1|Sun N1™ Grid Engine 6.1 User's Guide]].
-In order to schedule any job (interactively or batch) on a cluster, you must set your [[general/userguide/04_compute_environ?&#using-workgroup-and-directories|workgroup]] to define your cluster group or //investing-entity// compute nodes.
+===== What is a Job? =====
-See [[general:/userguide:06_runtime_environ?&#scheduling-jobs|Scheduling Jobs]] and [[general:/userguide:06_runtime_environ?&#managing-jobs|Managing Jobs]] for general information about getting started with scheduling and managing jobs on a cluster using Grid Engine.
+In this context, a //job// consists of:
-===== The job queues on Mills =====
+  * a sequence of commands to be executed
+  * a list of resource requirements and other properties affecting scheduling of the job
+  * a set of environment variables
-Each investing-entity on a cluster has four //owner queues// that exclusively use the investing-entity's compute nodes. (They do not use any nodes belonging to others.) Grid Engine allows those queues to be selected only by members of the investing-entity's group.
+For an //[[abstract/mills/runjobs/schedule_jobs#interactive-jobs-qlogin|interactive job]]//, the user manually types the sequence of commands once the job is eligible for execution.  If the necessary resources for the job are not immediately available, then the user must wait; when resources are available, the user must be present at his/her computer in order to type the commands.  Since the job scheduler does not care about the time of day, this could happen anytime, day or night.
-There are also node-wise queues, //standby//, //standby-4h//, //spillover-24core//, //spillover-48core// and //idle//.  Grid Engine allows users to use nodes belonging to other investing-entities. (The idle queue is currently disabled.)
+By comparison, a //[[abstract/mills/runjobs/schedule_jobs#batch-jobs-qsub|batch job]]// does not require the user be awake and at his or her computer:  the sequence of commands is saved to a file, and that file is given to the job scheduler.  A file containing a sequence of shell commands is also known as a //script//, so in order to run batch jobs a user must become familiar with //shell scripting//.  The benefits of using batch jobs are significant:
-When submitting a batch job to Grid Engine, you specify the resources you need or want for your job. **//You don't actually specify the name of the queue//**. Instead, you include a set of directives that specify your job's characteristics. Grid Engine then chooses the most appropriate queue that meets those needs.
+  * a //job script// can be reused (versus repeatedly having to type the same sequence of commands for each job)
+  * when resources are granted to the job it will execute immediately (day or night), yielding increased job throughput
-The queue to which a job is assigned depends primarily on six factors:
+An individual's increased job throughput is good for all users of the cluster!
-  * Whether the job is serial or parallel
+===== Queues =====
-  * Which parallel environment (e.g., openmpi, threads) is needed
-  * Which or how much of a resource is needed (e.g., max clock time, max memory)
-  * Whether the job can be suspended and restarted by the system.
-  * Whether the job is non-interactive or interactive
-  * Whether you want to use idle nodes belonging to others.
-For each investing-entity, the **owner-queue** names start with the investing-entity's name:
+At its most basic, a //queue// represents a collection of computing entities (call them nodes) on which jobs can be executed.  Each queue has properties that restrict what jobs are eligible to execute within it:  a queue may not accept interactive jobs; a queue may place an upper limit on how long the job will be allowed to execute or how much memory it can use; or specific users may be granted or denied permission to execute jobs in a queue.
-^   <<//investing_entity//>>''.q+''  | The default queue for non-interactive serial or parallel jobs. The primary queue for long-running jobs. These jobs must be able to be suspended and restarted by Grid Engine. They can be preempted by jobs submitted to the //development// queue, described next. Examples: all serial (single-core) jobs, openMPI jobs, openMP jobs or other jobs using the threads parallel environment. |
+<note>Grid Engine uses a //cluster queue// to embody the common set of properties that define the behavior of a queue.  The cluster queue acts as a template for the //queue instances// that exist for each node that executes jobs for the queue.  The term //queue// can refer to either of these, but in this documentation it will most often imply a //cluster queue//.</note>
-^  <<//investing_entity//>>''.q''  | A special queue for __non-suspendable__ parallel jobs, such as MPICH. These jobs will not be preempted by others' job submissions. |
-^  <<//investing_entity//>>''-qrsh.q''  | A special queue for interactive jobs only. Jobs are scheduled to this queue when you use Grid Engine's **qlogin** command. |
-^  ''standby.q''  | A special queue that spans all nodes, at most 240 slots per user.   Submissions will have a lower priority than jobs submitted to owner-queues, and standby jobs will only be started on lightly-loaded nodes.  These jobs will not be preempted by others' job submissions. Jobs will be terminated with notification after running for 8 hours of elapsed (wall-clock) time.  //Also see the ''standby-4h.q'' entry.//  |
-^  ::: | You must specify **–l standby=1** as a **qsub** option. You must also use the **-notify** option if your jobs traps the USR2 termination signal. [[general:jobsched:standby |(Details)]] |
-^  ''standby-4h.q''  | A special queue that spans all nodes, at most 816 slots per user.   Submissions will have a lower priority than jobs submitted to owner-queues, and standby jobs will only be started on lightly-loaded nodes.  These jobs will not be preempted by others' job submissions. Jobs will be terminated with notification after running for 4 hours of elapsed (wall-clock) time. |
-^  ::: | You must specify **–l standby=1** as a **qsub** option. And, if more than 240 slots are requested, you must also specify a maximum run-time of 4 hours or less via the **-l h_rt=//hh:mm:ss//** option. Finally, use the **-notify** option if your jobs traps the USR2 termination signal. [[general:jobsched:standby |(Details)]] |
-^  ''spillover-24core.q''  | A special queue that spans all standard nodes (24 cores) and is used by Grid Engine to map jobs when requested resources are unavailable on standard nodes in owner queues, e.g., node failure or other standby jobs are using owner resources. **Implemented on February 29, 2016** according to [[https://sites.udel.edu/research-computing/files/2016/01/MillsEnd-of-LifePlanandPolicies-3-1jp8lqd.pdf|Mills End-of-Life Policy]].|
-^  ''spillover-48core.q''  | A special queue that spans all 4-socket nodes (48 cores) and is used by Grid Engine to map jobs when requested resources are unavailable on 48-core nodes in owner queues, e.g., node failure or other standby jobs are using owner resources. Owners of only 48-core nodes will not spillover to standard nodes. **Implemented on February 29, 2016** according to [[https://sites.udel.edu/research-computing/files/2016/01/MillsEnd-of-LifePlanandPolicies-3-1jp8lqd.pdf|Mills End-of-Life Policy]].|
-^  ''spare.q''  | A special queue that spans all nodes kept in reserve as replacements for failed owner-nodes. Temporary access to the spare nodes will be granted by request. When access is granted, the spare nodes will augment your owner nodes.  Jobs on the spare nodes will not be preempted by others' job submissions, but may needed to be killed by IT. The owner of a job running on a spare node will be notified by email two hours before IT kills the job. |
+When submitting a job to Grid Engine, a user can explicitly specify which queue to use:  doing so will place that queue's resource restrictions (e.g. maximum execution time, maximum memory) on the job, even if they are not appropriate.  Usually it is easier if the user specifies what resources his or her job requires and lets Grid Engine choose an appropriate queue.
+===== Job scheduling system =====
-<note tip>
+A job scheduling system is used to manage and control the computing resources for all jobs submitted to a cluster. This includes load balancing, limiting resources, reconciling requests for memory and processor cores with availability of those resources, suspending and restarting jobs, and managing jobs with different priorities.
-Be considerate in your use of the development queue. It may preempt '**q+**' jobs being run by other users in your group if those jobs' computational resources are needed.
+Each investing-entity's group (workgroup) has owner queues that allow the use a fixed number of slots to match the total number of cores purchased.  If a job is submitted that would use more than the slots allowed, the job will wait until enough slots are made available by completed jobs.  There is no time limit imposed on owner queue jobs.  All users can see running and waiting jobs, which allows groups to work out policies for managing purchased nodes.
+The standby queues are available for projects requiring more slots than purchased, or to take advantage of idle nodes when a job would have to wait in the owner queue.  Other workgroup nodes will be used, so standby jobs have a time limit, and users are limited to a total number of cores for all of their standby jobs.  Generally, users can use 10 nodes for an 8 hour standby job or 40 nodes for a 4 hour standby job.
+A spillover queue may be available for the case where a job is submitted to the owner queue, and there are standby jobs consuming needed slots. Instead of waiting, the jobs will be sent to the spillover queue to start on a similar idle node.
+==== Grid Engine ====
+The Grid Engine job scheduling system is used to manage and control the computing resources for all jobs submitted to a cluster. This includes load balancing, reconciling requests for memory and processor cores with availability of those resources, suspending and restarting jobs, and managing jobs with different priorities. Grid Engine on Farber is Univa Grid Engine but still referred to as SGE.
+In order to schedule any job (interactively or batch) on a cluster, you must set your [[abstract/farber/system_access/system_access#logging-on-to-farber|workgroup]] to define your cluster group or //investing-entity// compute nodes.
+See [[abstract/farber/runjobs/schedule_jobs|Scheduling Jobs]] and [[abstract/farber/runjobs/job_status|Managing Jobs]] for general information about getting started with scheduling and managing jobs on a cluster using Grid Engine.
+===== Runtime environment =====
+Generally, your runtime environment (path, environment variables, etc.) should be the same as your compile-time environment. Usually, the best way to achieve this is to put the relevant VALET commands in shell scripts. You can reuse common sets of commands by storing them in a shell script file that can be //sourced //from within other shell script files.
+<note important>
+If you are writing an executable script that does not have the **-l** option on the **bash** command, and you want to include VALET commands in your script, then you should include the line:
+<code bash>
+source /etc/profile.d/valet.sh
+</code>
+You do not need this command when you
+  - type commands, or source the command file,
+  - include lines in the file to be submitted to the qsub.
 </note>
+===== Getting Help =====
+Grid Engine includes man pages for all of the commands that will be reviewed in this document.  When logged-in to a cluster, type
+<code bash>
+[traine@mills ~]$ man qstat
+</code>
+to learn more about a Grid Engine command (in this case, ''qstat'').  Most commands will also respond to the ''-help'' command-line option to provide a succinct usage summary:
+<code base>
+[traine@mills ~]$ qstat -help
+usage: qstat [options]
+        [-cb]                             view additional binding specific parameters
+        [-ext]                            view additional attributes
+           :
+</code>
+//This section uses the wiki's [[http://docs-dev.hpc.udel.edu/doku.php#documentation-conventions|documentation conventions]].//
 ===== Resource-management options on Mills =====
@@ Line 92: / Line 129: @@
 ==== Parallel environments ====
-The ''/opt/templates/gridengine'' directory contains basic prototype job scripts for non-interactive parallel jobs. This section describes the **–pe** parallel environment option that's required for MPI jobs, openMP jobs and other jobs that use the SMP (threads) programming model.
+The ''/opt/shared/templates/gridengine'' directory contains basic prototype job scripts for non-interactive parallel jobs. This section describes the **–pe** parallel environment option that's required for MPI jobs, openMP jobs and other jobs that use the SMP (threads) programming model.
 Type the command:
@@ Line 126: / Line 163: @@
 <note tip>
-IT provides a job script template called ''openmp.qs'' available in ''/opt/templates/gridengine/openmp'' to copy and customize for your OpenMP jobs.
+IT provides a job script template called ''openmp.qs'' available in ''/opt/shared/templates/gridengine/openmp'' to copy and customize for your OpenMP jobs.
 </note>
@@ Line 133: / Line 170: @@
 MPI jobs inherently generate considerable network traffic among the processor cores of a cluster's compute nodes. The processors on the compute node may be connected by two types of networks: InfiniBand and Gigabit Ethernet.
-IT has developed templates to help with the **openmpi** parallel environments for a given [[clusters/start|cluster]], targeting different user needs and architecture. You can copy the templates from ''/opt/templates/gridengine/openmpi'' and customize them. These templates are essentially identical with the exception of the presence or absence of certain **qsub** options and the values assigned to **MPI_FLAGS** based on using particular environment variables. In all cases, the parallel environment option must be specified:
+IT has developed templates to help with the **openmpi** parallel environments for a given [[clusters/start|cluster]], targeting different user needs and architecture. You can copy the templates from ''/opt/shared/templates/gridengine/openmpi'' and customize them. These templates are essentially identical with the exception of the presence or absence of certain **qsub** options and the values assigned to **MPI_FLAGS** based on using particular environment variables. In all cases, the parallel environment option must be specified:
 ''-pe openmpi'' <<//NPROC//>>
@@ Line 140: / Line 177: @@
 <note tip>
-IT provides several job script templates in ''/opt/templates/gridengine/openmpi'' to copy and customize for your Open MPI jobs. See [[software:openmpi:mills|Open MPI on Mills]] for more details about these job scripts.
+IT provides several job script templates in ''/opt/shared/templates/gridengine/openmpi'' to copy and customize for your Open MPI jobs. See [[software:openmpi:mills|Open MPI on Mills]] for more details about these job scripts.
 </note>