technical:slurm:caviness:gen3-additions

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

technical:slurm:caviness:gen3-additions [2023-05-30 11:53] – created freytechnical:slurm:caviness:gen3-additions [2023-05-30 11:53] (current) frey
Line 1: Line 1:
 +====== Revisions to Slurm Configuration v2.3.2 on Caviness ======
  
 +This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster.
 +
 +===== Issues =====
 +
 +==== Caviness Expansion 3 ====
 +
 +Two new racks (r05, r06) has been added to the Caviness cluster.  Nodes in the new rack must be integrated into the Slurm configuration for job scheduling.  First-time investing workgroups must be added to Slurm accounting, and all workgroups' QOS-based resource limits and fairshare factors must be updated.
 +
 +===== Implementation =====
 +
 +  * The Slurm ''nodes.conf'' file will be modified to include r05, r06.
 +  * The Slurm ''partitions.conf'' file will be modified to:
 +    * Adjust node assignments for existing workgroups who purchased node(s) in r05, r06
 +    * Add new workgroups who purchased node(s) in r05, r06
 +  * The Slurm ''topology.conf'' file will be modified to include OPA switches/HFIs in r05, r06
 +    * The ''/opt/shared/slurm/add-ons/bin/opa2slurm'' utility (written by IT-RCI staff) will be used to automatically map the OPA network
 +  * The Slurm accounting database will be updated:
 +    * New workgroups added and populated with members of the workgroup
 +    * For each workgroup update calculated fairshare fraction (dollar percentage of workgroup investment)
 +    * For each workgroup update workgroup-partition maximum CPU/memory/GPU limit
 +
 +===== Impact =====
 +
 +No downtime is expected to be required.  The version of the configuration will be bumped to v2.4.0.
 +
 +===== Timeline =====
 +
 +^Date ^Time ^Goal/Description ^
 +|2023-05-27| |Authoring of this document|
 +|2023-05-30|09:00|Implementation|