====== DARWIN Slurm Job Script Templates ======
As on Caviness, environment sense and setup code has been shifted out of the job script templates and into external script fragments that are sourced (executed) by the job script. What remains in the job script templates is the setting of variables that influence those external fragments' execute and the sourcing of them. When IT-RCI must change the behavior of job scripts, the external fragments are modified and the change is effected for all job scripts deriving from the templates.
===== Where Can I Find Them? =====
IT-RCI staff are maintaining the DARWIN templates via git. The production copy of the repository is checked-out in ''/opt/shared/slurm/templates'' with a symbolic link present at ''/opt/shared/templates/slurm'' to maintain parity with other HPC systems.
The external script fragments (mentioned above and discussed in detail below) can be found in the ''/opt/shared/slurm/templates/libexec'' directory.
The ''/opt/shared/templates/slurm'' symbolic link points to the ''/opt/shared/slurm/templates/job-scripts'' directory, which is organized into distinct job classes. The top level is split into ''applications'' and ''generic''.
==== Applications ====
Software packages that have unique runtime requirements will have a single script or a directory of scripts located in the ''applications'' directory. TensorFlow is a good example: DARWIN uses Linux containers (created by Google and distributed via Docker) to execute TensorFlow scripts on compute nodes. Gaussian requires extra work to tailor input files and its own expected environment variables to match the resources allocated to the job by Slurm.
==== Generic ====
The application-specific job scripts are actually based on the generic scripts present in the ''generic'' directory. Serial jobs can make use of the ''serial.qs'' script; programs leveraging threaded (e.g. OpenMP) parallelism can use ''threads.qs'' as a starting point.
The ''mpi'' directory divides that programming paradigm into implementation-specific variants: ''mpich'', ''openmpi'', and ''generic'' (a catch-all that uses machine files and should generally NOT be used).
===== Hierarchical Modularity =====
Environment setup tasks have been abstracted into each external script fragment file. Examining the fragment directory:
-rw-r--r-- 1 frey sysadmin 1733 Sep 12 2018 common.sh
-rw-r--r-- 1 frey sysadmin 5621 May 6 14:11 gaussian.sh
-rw-r--r-- 1 frey sysadmin 1580 Sep 12 2018 generic-mpi.sh
-rw-r--r-- 1 frey sysadmin 1432 Sep 14 2018 mpich.sh
-rw-r--r-- 1 frey sysadmin 4805 Sep 12 2018 openmpi.sh
-rw-r--r-- 1 frey sysadmin 2209 May 6 14:11 openmp.sh
===== Signal Handling =====
One thing added to the job environment by the ''common.sh'' fragment for jobs that register a preemption/timeout signal handler is the ''UD_EXEC'' function. By default, if a Bash shell is currently executing a command it will not handle any signals until that command has completed executing. Consider a job script like this:
:
cleanup() {
echo "Time limit exceeded, scrubbing all junk files now"
exit 0
}
UD_JOB_EXIT_FN=cleanup
. /opt/shared/slurm/templates/libexec/common.sh
sleep 500000000
When this job is preempted, the ''SIGTERM'' signal is delivered to the Bash shell. The shell notes this, but waits for the ''sleep'' command to finish executing. Since that sleep will last 15.85 years, the 5 minute grace period given to jobs that are preempted or time out expires before the ''cleanup'' function ever gets called and the job is killed.
In order to get signals to work asynchronously in the Bash shell, long-running commands must be run in the background. The ''UD_EXEC'' function does just that:
:
cleanup() {
echo "Time limit exceeded, scrubbing all junk files now"
exit 0
}
UD_JOB_EXIT_FN=cleanup
. /opt/shared/slurm/templates/libexec/common.sh
UD_EXEC sleep 500000000
The ''sleep'' command is executed in the background and when the ''SIGTERM'' preemption/timeout signal is delivered, the shell immediately calls the ''cleanup'' function as expected.