technical:slurm:caviness:gen3-additions

Revisions to Slurm Configuration v2.3.2 on Caviness

This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster.

Two new racks (r05, r06) has been added to the Caviness cluster. Nodes in the new rack must be integrated into the Slurm configuration for job scheduling. First-time investing workgroups must be added to Slurm accounting, and all workgroups' QOS-based resource limits and fairshare factors must be updated.

  • The Slurm nodes.conf file will be modified to include r05, r06.
  • The Slurm partitions.conf file will be modified to:
    • Adjust node assignments for existing workgroups who purchased node(s) in r05, r06
    • Add new workgroups who purchased node(s) in r05, r06
  • The Slurm topology.conf file will be modified to include OPA switches/HFIs in r05, r06
    • The /opt/shared/slurm/add-ons/bin/opa2slurm utility (written by IT-RCI staff) will be used to automatically map the OPA network
  • The Slurm accounting database will be updated:
    • New workgroups added and populated with members of the workgroup
    • For each workgroup update calculated fairshare fraction (dollar percentage of workgroup investment)
    • For each workgroup update workgroup-partition maximum CPU/memory/GPU limit

No downtime is expected to be required. The version of the configuration will be bumped to v2.4.0.

Date Time Goal/Description
2023-05-27 Authoring of this document
2023-05-3009:00Implementation
  • technical/slurm/caviness/gen3-additions.txt
  • Last modified: 2023-05-30 11:53
  • by frey