Running applications on Caviness
Introduction
The Slurm workload manager (job scheduling system) is used to manage and control the resources available to computational tasks. The job scheduler considers each job's resource requests (memory, disk space, processor cores) and executes it as those resources become available. As a cluster workload manager, Slurm has three key functions: (1) It allocates exclusive and/or non-exclusive access to resources (compute nodes) to users for some duration of time so they can perform work. (2) It provides a framework for starting, executing, and monitoring work (normally a parallel job) on the set of allocated nodes. (3) It arbitrates contention for resources by managing a queue of pending work.
Without a job scheduler, a cluster user would need to manually search for the resources required by his or her job, perhaps by randomly logging-in to nodes and checking for other users' programs already executing thereon. The user would have to "sign-out" the nodes he or she wishes to use in order to notify the other cluster users of resource availability1). A computer will perform this kind of chore more quickly and efficiently than a human can, and with far greater sophistication.
Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters. Documentation for the current version of Slurm provided by SchedMD SchedMD Slurm Documentation.
You may find it helpful when migrating from one scheduler to another such as GridEngine to Slurm to refer to SchedMD's rosetta showing equivalent commands across various schedulers and their command/option summary (two pages).
/opt/shared/templates/slurm/
for updated or new templates to use as job scripts to run generic or specific applications designed to provide the best performance on Caviness.
Need help? See Introduction to Slurm in UD's HPC community cluster environment.
What is a Job?
In this context, a job consists of:
- a sequence of commands to be executed
- a list of resource requirements and other properties affecting scheduling of the job
- a set of environment variables
For an interactive job, the user manually types the sequence of commands once the job is eligible for execution. If the necessary resources for the job are not immediately available, then the user must wait; when resources are available, the user must be present at his/her computer in order to type the commands. Since the job scheduler does not care about the time of day, this could happen anytime, day or night.
By comparison, a batch job does not require the user be awake and at his or her computer: the sequence of commands is saved to a file, and that file is given to the job scheduler. A file containing a sequence of shell commands is also known as a script, so in order to run batch jobs a user must become familiar with shell scripting. The benefits of using batch jobs are significant:
- a job script can be reused (versus repeatedly having to type the same sequence of commands for each job)
- when resources are granted to the job it will execute immediately (day or night), yielding increased job throughput
An individual's increased job throughput is good for all users of the cluster!
It is important for jobs to run on compute nodes and not login nodes. Without effective limits in place, a single user could monopolize a login node and leave the cluster inaccessible to others. Please review Per-process CPU time limits on Caviness login nodes summarizing current resource limits and the need for and implementation of additional limits on the Caviness cluster login nodes.
Queues
At its most basic, a queue represents a collection of computing entities (call them nodes) on which jobs can be executed. Each queue has properties that restrict what jobs are eligible to execute within it: a queue may not accept interactive jobs; a queue may place an upper limit on how long the job will be allowed to execute or how much memory it can use; or specific users may be granted or denied permission to execute jobs in a queue.
When submitting a job to Slurm, a user can explicitly specify which partition to use: doing so will place that partitions's resource restrictions (e.g. maximum execution time, maximum memory) on the job, even if they are not appropriate. Usually it is easier if the user specifies what resources his or her job requires and lets Slurm choose an appropriate partition.
Slurm
The Slurm workload manager is used to manage and control the computing resources for all jobs submitted to a cluster. This includes load balancing, reconciling requests for memory and processor cores with availability of those resources, suspending and restarting jobs, and managing jobs with different priorities.
In order to schedule any job (interactively or batch) on a cluster, you must set your workgroup to define your cluster group or investing-entity compute nodes.
See Scheduling Jobs and Managing Jobs on the sidebar for general information about getting started with scheduling and managing jobs on a cluster using Slurm.
Runtime environment
Generally, your runtime environment (path, environment variables, etc.) should be the same as your compile-time environment. Usually, the best way to achieve this is to put the relevant VALET commands in shell scripts. You can reuse common sets of commands by storing them in a shell script file that can be sourced from within other shell script files.
source /etc/profile.d/valet.sh
You do not need this command when you
- type commands, or source the command file,
- include lines in the file to be submitted to the sbatch.
Getting Help
Slurm includes man pages for all of the commands that will be reviewed in this document. When logged-in to a cluster, type
[traine@caviness ~]$ man squeue
to learn more about a Slurm command (in this case, squeue
). Most commands will also respond to the -help
command-line option to provide a succinct usage summary:
[traine@caviness ~]$ squeue -help Usage: squeue [OPTIONS] -A, --account=account(s) comma separated list of accounts to view, default is all accounts -a, --all display jobs in hidden partitions --array-unique display one unique pending job array element per line --federation Report federated information if a member of one :
This section uses the wiki's documentation conventions.