This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster.
When the salloc
command is used without a script and arguments to it, the value configured in the SallocDefaultCommand
key (in /etc/slurm/slurm.conf) provides the default command to execute on the allocated resources. For example:
[(workgroup:user)@login01 ~]$ salloc --partition=devel salloc: Granted job allocation 13065047 salloc: Waiting for resource configuration salloc: Nodes r00n56 are ready for job [user@r00n56 ~]$
The default command as defined in the Slurm configuration mirrors the suggested default from the Slurm developers:
SallocDefaultCommand="srun -n1 -N1 --mpi=none --pty $SHELL"
Thus, the salloc
illustrated above is equivalent to:
[(workgroup:user)@login01 ~]$ salloc --partition=devel srun -n1 -N1 --mpi=none --pty $SHELL salloc: Granted job allocation 13065047 salloc: Waiting for resource configuration salloc: Nodes r00n56 are ready for job [user@r00n56 ~]$
There are several issues with this default command:
-n1 -N1
limits the remote shell to accessing a single task of the allocation$SHELL
, coming from the user's current environment) is not executed as a login shellsrun
by default propagates the user's current environment variables to the remote node(s); we generally do not recommend this behavior on Caviness
On the first point, the Slurm allocation may have been for -N1 -n4 -c8
(one node, four tasks, eight CPUs per task), but the remote shell will only have access to one task (with eight CPUs). The user more likely anticipated the remote shell's having access to the full set of resources allocated on the primary node assigned to the job, akin to the batch step in submitted job scripts.
The second and third points may prevent some runtime environment setup from happening; this can be problematic when exported environment variables are reconstituted in the runtime environment by Slurm, but unexported variables, aliases, and functions are not restored. Our best-practice for job scripts is to send no environment variables from the submission environment to the runtime environment; ideally, the same should be observed for interactive sessions.
To address the issue of all resources' on the primary node not being made available to the remote shell, the node and task counts will be dropped from the SallocDefaultCommand
. The --cpu-bind=none
flag will be added: otherwise, the shell defaults to having a task affinity mask applied by slurmstepd
that restricts it to just one of the allocated physical CPU cores.
The majority of command shells recognize the -l
flag as requesting login shell behavior. Appending a -l
flag to the SallocDefaultCommand
should be sufficient.
Finally, with regard to environment variable propagation, adding --export=NONE
to the SallocDefaultCommand
would implement the best-practice we seek to promote, but that behavior cannot be overridden with command line flags to salloc
or via the environment (with SLURM_EXPORT_ENV
). The only possible override is for a user to opt to not use the SallocDefaultCommand
and provide an explicit command, e.g. an srun
lacking the --export
flag that appears in SallocDefaultCommand
. The desired best-practice must be assumed to be the dominant use case (and will correlate with official documentation, for example), so adding --export=NONE
to the SallocDefaultCommand
is the correct choice.
This yields an altered SallocDefaultCommand
of:
SallocDefaultCommand="srun --mpi=none --pty --export=NONE --cpu-bind=none $SHELL -l"
No downtime is necessary since this change affects the behavior of the salloc
command (not any of the Slurm daemons). The new configuration will be pushed to all nodes and take effect immediately.
Date | Time | Goal/Description |
---|---|---|
2022-01-12 | Authoring of this document | |
2022-01-19 | 09:00 | Implementation |