Revisions to Slurm Configuration v2.3.2 on Caviness
This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster.
Issues
Caviness Expansion 3
Two new racks (r05, r06) has been added to the Caviness cluster. Nodes in the new rack must be integrated into the Slurm configuration for job scheduling. First-time investing workgroups must be added to Slurm accounting, and all workgroups' QOS-based resource limits and fairshare factors must be updated.
Implementation
- The Slurm
nodes.conf
file will be modified to include r05, r06. - The Slurm
partitions.conf
file will be modified to:- Adjust node assignments for existing workgroups who purchased node(s) in r05, r06
- Add new workgroups who purchased node(s) in r05, r06
- The Slurm
topology.conf
file will be modified to include OPA switches/HFIs in r05, r06- The
/opt/shared/slurm/add-ons/bin/opa2slurm
utility (written by IT-RCI staff) will be used to automatically map the OPA network
- The Slurm accounting database will be updated:
- New workgroups added and populated with members of the workgroup
- For each workgroup update calculated fairshare fraction (dollar percentage of workgroup investment)
- For each workgroup update workgroup-partition maximum CPU/memory/GPU limit
Impact
No downtime is expected to be required. The version of the configuration will be bumped to v2.4.0.
Timeline
Date | Time | Goal/Description |
---|---|---|
2023-05-27 | Authoring of this document | |
2023-05-30 | 09:00 | Implementation |