Differences
This shows you the differences between two versions of the page.
technical:slurm:reboot-and-helper-scripts [2019-05-22 12:27] – created frey | technical:slurm:reboot-and-helper-scripts [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Revisions to Slurm Configuration v1.1.4 on Caviness ====== | ||
- | This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster. | ||
- | |||
- | ===== Issues ===== | ||
- | |||
- | ==== Staged reboots require RebootProgram ==== | ||
- | |||
- | One feature of Slurm is that the sysadmin can request that node(s) be rebooted once all jobs running on them have completed. | ||
- | |||
- | ===== Solutions ===== | ||
- | |||
- | ==== Add a RebootProgram ==== | ||
- | |||
- | Graceful reboots handled by the Linux init/ | ||
- | |||
- | < | ||
- | #!/bin/bash | ||
- | # | ||
- | # Force a reboot of the node. | ||
- | # | ||
- | |||
- | IPMITOOL=" | ||
- | if [ $? -eq 0 ]; then | ||
- | " | ||
- | else | ||
- | / | ||
- | fi | ||
- | |||
- | </ | ||
- | |||
- | Rather than installing this script on the NFS software share (''/ | ||
- | |||
- | === Add / | ||
- | |||
- | A local Slurm-support executables directory — ''/ | ||
- | |||
- | ===== Implementation ===== | ||
- | |||
- | Addition of the ''/ | ||
- | |||
- | Slurm changes are effected by altering the configuration files, pushing the changed files to all nodes, and signaling a change in configuration so all daemons refresh their configuration. | ||
- | |||
- | ===== Impact ===== | ||
- | |||
- | No downtime is expected to be required. | ||
- | |||
- | ===== Timeline ===== | ||
- | |||
- | ^Date ^Time ^Goal/ | ||
- | |2019-05-22| |Authoring of this document| | ||
- | | | |Changes made| |