technical:slurm:auto_tmpdir

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
technical:slurm:auto_tmpdir [2020-03-12 11:03] freytechnical:slurm:auto_tmpdir [2020-03-12 11:44] frey
Line 47: Line 47:
 ===== The auto_tmpdir plugin ===== ===== The auto_tmpdir plugin =====
  
 +The original **auto_tmpdir** Slurm plugin has been rewritten (as of March 12, 2020) to no longer set ''$TMPDIR'' to the per-job directory it creates.  Instead, it creates the following paths and bind-mounts them:
  
 +^Directory created^Bind mountpoint^
 +|''/tmp/job-«job-id»''| |
 +|''/tmp/job-«job-id»/tmp''|''/tmp''|
 +|''/tmp/job-«job-id»/var_tmp''|''/var/tmp''|
 +|''/dev/shm/job-«job-id»''|''/dev/shm''|
 +
 +==== Shared tmpdir ====
 +
 +In some cases the user may want the ''/tmp'' directory for the job to be shared by all nodes participating on the job — e.g. somewhere on ''/lustre/scratch'' The **auto_tmpdir** plugin implements a ''--use-shared-tmpdir'' flag to the **salloc/srun/sbatch** commands to request this:
 +
 +^Directory created^Bind mountpoint^
 +|''/lustre/scratch/slurm/job-«job-id»''| |
 +|''/lustre/scratch/slurm/job-«job-id»/tmp''|''/tmp''|
 +|''/lustre/scratch/slurm/job-«job-id»/var_tmp''|''/var/tmp''|
 +|''/dev/shm/job-«job-id»''|''/dev/shm''|
 +
 +A variant on the shared temporary directory scheme is to have each node use its own separate subdirectory (''--use-shared-tmpdir=per-node''):
 +
 +^Directory created^Bind mountpoint^
 +|''/lustre/scratch/slurm/job-«job-id»''| |
 +|''/lustre/scratch/slurm/job-«job-id»/«hostname»''| |
 +|''/lustre/scratch/slurm/job-«job-id»/«hostname»/tmp''|''/tmp''|
 +|''/lustre/scratch/slurm/job-«job-id»/«hostname»/var_tmp''|''/var/tmp''|
 +|''/dev/shm/job-«job-id»''|''/dev/shm''|
 +
 +When using the ''--use-shared-tmpdir'' flag, the plugin can also be asked to //not// remove the directories when the job exits by including the ''--no-rm-tmpdir'' flag.
 +
 +<WRAP center round important 60%>
 +The ''--no-rm-tmpdir'' flag should be used very cautiously, since leaving files behind on ''/lustre/scratch'' will consume capacity on that file system.  A viable usage scenario would be debugging a job script that copies files to local scratch, runs a job, then copies results back to other storage.  Once that behavior is debugged and goes into production the user would stop using the ''--no-rm-tmpdir'' and ''--use-shared-tmpdir'' flags.
 +</WRAP>
 +
 +===== Source code =====
 +
 +The source code for the **auto_tmpdir** plugin is publicly available on [[https://github.com/jtfrey/auto_tmpdir/|Github]].