Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision |
technical:slurm:auto_tmpdir [2020-03-12 11:03] – frey | technical:slurm:auto_tmpdir [2020-03-12 11:44] – frey |
---|
===== The auto_tmpdir plugin ===== | ===== The auto_tmpdir plugin ===== |
| |
| The original **auto_tmpdir** Slurm plugin has been rewritten (as of March 12, 2020) to no longer set ''$TMPDIR'' to the per-job directory it creates. Instead, it creates the following paths and bind-mounts them: |
| |
| ^Directory created^Bind mountpoint^ |
| |''/tmp/job-«job-id»''| | |
| |''/tmp/job-«job-id»/tmp''|''/tmp''| |
| |''/tmp/job-«job-id»/var_tmp''|''/var/tmp''| |
| |''/dev/shm/job-«job-id»''|''/dev/shm''| |
| |
| ==== Shared tmpdir ==== |
| |
| In some cases the user may want the ''/tmp'' directory for the job to be shared by all nodes participating on the job — e.g. somewhere on ''/lustre/scratch''. The **auto_tmpdir** plugin implements a ''--use-shared-tmpdir'' flag to the **salloc/srun/sbatch** commands to request this: |
| |
| ^Directory created^Bind mountpoint^ |
| |''/lustre/scratch/slurm/job-«job-id»''| | |
| |''/lustre/scratch/slurm/job-«job-id»/tmp''|''/tmp''| |
| |''/lustre/scratch/slurm/job-«job-id»/var_tmp''|''/var/tmp''| |
| |''/dev/shm/job-«job-id»''|''/dev/shm''| |
| |
| A variant on the shared temporary directory scheme is to have each node use its own separate subdirectory (''--use-shared-tmpdir=per-node''): |
| |
| ^Directory created^Bind mountpoint^ |
| |''/lustre/scratch/slurm/job-«job-id»''| | |
| |''/lustre/scratch/slurm/job-«job-id»/«hostname»''| | |
| |''/lustre/scratch/slurm/job-«job-id»/«hostname»/tmp''|''/tmp''| |
| |''/lustre/scratch/slurm/job-«job-id»/«hostname»/var_tmp''|''/var/tmp''| |
| |''/dev/shm/job-«job-id»''|''/dev/shm''| |
| |
| When using the ''--use-shared-tmpdir'' flag, the plugin can also be asked to //not// remove the directories when the job exits by including the ''--no-rm-tmpdir'' flag. |
| |
| <WRAP center round important 60%> |
| The ''--no-rm-tmpdir'' flag should be used very cautiously, since leaving files behind on ''/lustre/scratch'' will consume capacity on that file system. A viable usage scenario would be debugging a job script that copies files to local scratch, runs a job, then copies results back to other storage. Once that behavior is debugged and goes into production the user would stop using the ''--no-rm-tmpdir'' and ''--use-shared-tmpdir'' flags. |
| </WRAP> |
| |
| ===== Source code ===== |
| |
| The source code for the **auto_tmpdir** plugin is publicly available on [[https://github.com/jtfrey/auto_tmpdir/|Github]]. |