====== Revisions to Slurm Configuration v2.3.1 on Caviness ======
This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster.
===== Issues =====
When the ''salloc'' command is used without a script and arguments to it, the value configured in the ''SallocDefaultCommand'' key (in /etc/slurm/slurm.conf) provides the default command to execute on the allocated resources. For example:
[(workgroup:user)@login01 ~]$ salloc --partition=devel
salloc: Granted job allocation 13065047
salloc: Waiting for resource configuration
salloc: Nodes r00n56 are ready for job
[user@r00n56 ~]$
The default command as defined in the Slurm configuration mirrors the suggested default from the Slurm developers:
SallocDefaultCommand="srun -n1 -N1 --mpi=none --pty $SHELL"
Thus, the ''salloc'' illustrated above is equivalent to:
[(workgroup:user)@login01 ~]$ salloc --partition=devel srun -n1 -N1 --mpi=none --pty $SHELL
salloc: Granted job allocation 13065047
salloc: Waiting for resource configuration
salloc: Nodes r00n56 are ready for job
[user@r00n56 ~]$
There are several issues with this default command:
- The inclusion of ''-n1 -N1'' limits the remote shell to accessing a single task of the allocation
- The remote shell (''$SHELL'', coming from the user's current environment) is not executed as a login shell
- The ''srun'' by default propagates the user's current environment variables to the remote node(s); we generally do not recommend this behavior on Caviness
On the first point, the Slurm allocation may have been for ''-N1 -n4 -c8'' (one node, four tasks, eight CPUs per task), but the remote shell will only have access to one task (with eight CPUs). The user more likely anticipated the remote shell's having access to the full set of resources allocated on the primary node assigned to the job, akin to the batch step in submitted job scripts.
The second and third points may prevent some runtime environment setup from happening; this can be problematic when exported environment variables are reconstituted in the runtime environment by Slurm, but unexported variables, aliases, and functions are not restored. Our best-practice for job scripts is to send **no environment variables** from the submission environment to the runtime environment; ideally, the same should be observed for interactive sessions.
===== Implementation =====
To address the issue of all resources' on the primary node not being made available to the remote shell, the node and task counts will be dropped from the ''SallocDefaultCommand''. The ''%%--%%cpu-bind=none'' flag will be added: otherwise, the shell defaults to having a //task affinity mask// applied by ''slurmstepd'' that restricts it to just one of the allocated physical CPU cores.
The majority of command shells recognize the ''-l'' flag as requesting login shell behavior. Appending a ''-l'' flag to the ''SallocDefaultCommand'' should be sufficient.
Finally, with regard to environment variable propagation, adding ''%%--%%export=NONE'' to the ''SallocDefaultCommand'' would implement the best-practice we seek to promote, but that behavior cannot be overridden with command line flags to ''salloc'' or via the environment (with ''SLURM_EXPORT_ENV''). The only possible override is for a user to opt to not use the ''SallocDefaultCommand'' and provide an explicit command, e.g. an ''srun'' lacking the ''%%--%%export'' flag that appears in ''SallocDefaultCommand''. The desired best-practice must be assumed to be the dominant use case (and will correlate with official documentation, for example), so adding ''%%--%%export=NONE'' to the ''SallocDefaultCommand'' is the correct choice.
This yields an altered ''SallocDefaultCommand'' of:
SallocDefaultCommand="srun --mpi=none --pty --export=NONE --cpu-bind=none $SHELL -l"
===== Impact =====
No downtime is necessary since this change affects the behavior of the ''salloc'' command (not any of the Slurm daemons). The new configuration will be pushed to all nodes and take effect immediately.
===== Timeline =====
^Date ^Time ^Goal/Description ^
|2022-01-12| |Authoring of this document|
|2022-01-19|09:00|Implementation|