Revisions to Slurm Configuration v1.1.4 on Caviness
This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster.
Issues
Staged reboots require RebootProgram
One feature of Slurm is that the sysadmin can request that node(s) be rebooted once all jobs running on them have completed. This sort of staged reboot of compute nodes is a useful way to handle rolling upgrades to the compute nodes' OS, for example, since the cluster schedules around the nodes that are awaiting reboot. Reboots are facilitated by an external program/command that must be explicitly configured in Slurm. At this time no RebootProgram
is configured.
Solutions
Add a RebootProgram
Graceful reboots handled by the Linux init/systemd processes have the unfortunate tendency to hang on the clusters. A cyclical dependency exists that prevents Lustre and OPA services from shutting down properly. Thus, we typically have the node's BMC power cycle to effect a reboot. A simple script has been written that uses the local ipmitool
if present, and otherwise attempts a forced reboot:
#!/bin/bash # # Force a reboot of the node. # IPMITOOL="$(which ipmitool)" if [ $? -eq 0 ]; then "$IPMITOOL" power reset else /usr/sbin/reboot --force fi
Rather than installing this script on the NFS software share (/opt/shared/slurm
) it will be sync'ed by Warewulf to the node's root filesystem. Should NFS service be down for some reason, the slurmd
will still be able to effect a requested reboot.
Add /etc/slurm/libexec
A local Slurm-support executables directory — /etc/slurm/libexec
— will be added with the configuration files in compute node VNFS images. The directory will have permissions making it accessible only by the Slurm user and group. Future support executables (e.g. prolog/epilog scripts) will also be sync'ed to this directory.
Implementation
Addition of the /etc/slurm/libexec
directory to VNFS images and running root filesystems requires no reboots. The node-reboot
script is imported into Warewulf and added to all compute node provisioning profiles for automatic distribution.
Slurm changes are effected by altering the configuration files, pushing the changed files to all nodes, and signaling a change in configuration so all daemons refresh their configuration.
Impact
No downtime is expected to be required. As this change is entirely operational in nature and has zero effect on users, it will be made without the usual review by stakeholders and users.
Timeline
Date | Time | Goal/Description |
---|---|---|
2019-05-22 | Authoring of this document | |
Changes made |