Revision to Slurm node features
This document summarizes the addition of automatically-gathered features associated with nodes, which users can leverage to better-constrain on what nodes a job executes.
Issues
Every compute node registered in Slurm has a list of zero or more features – strings that identify a functionality, identity, or other attribute associated with the node. On DARWIN, all compute nodes have always been statically-configured with a set of generational, CPU, and nominal features. Take, for example, these DARWIN nodes:
[user@login01.darwin ~]$ scontrol show node r1n00 | grep Features AvailableFeatures=standard,512GiB ActiveFeatures=standard,512GiB [user@login01.darwin ~]$ scontrol show node r1t00 | grep Features AvailableFeatures=nvidia-gpu,nvidia-t4,t4,512GiB ActiveFeatures=nvidia-gpu,nvidia-t4,t4,512GiB [user@login01.darwin ~]$ scontrol show node r2v00 | grep Features AvailableFeatures=nvidia-gpu,nvidia-v100,v100,768GiB ActiveFeatures=nvidia-gpu,nvidia-v100,v100,768GiB
A user can limit which nodes are permissible for a submitted job:
[user@login01.darwin ~]$ sbatch --constraint=512GiB …
would mean r1n00 or r1t00 could be used to execute the job but r2v00 would not.
While these existing features can be useful, they do not directly assist in choosing nodes based on the hardware capabilities. Some software may demand a CPU with AVX512 ISA extensions, but Slurm does not inherently know whether or not a node's CPU has that capability, nor do our existing features directly indicate it.
A list of all ISA extensions supported by a CPU is present in a Linux system's /proc/cpuinfo file. It would be helpful if the list of statically-configured features that have always existed is augmented by additional features added dynamically by the Slurm software running on the compute node.
Implementation
A Slurm plugin has been written by IT-RCI staff that synthesizes additional features by consulting a node's /proc/cpuinfo file. All features synthesized by the plugin are formatted as <TYPE>::<VALUE>, where the possible <TYPE> values are:
| Type | Description |
|---|---|
VENDOR | CPU vendor name (e.g. GenuineIntel or AuthenticAMD) |
MODEL | succinct CPU model name extracted from the verbose name |
CACHE | kilobytes of cache reported by the CPU |
ISA | available ISA extensions (e.g. avx512f or sse4_1) |
PCI | special PCI devices (e.g. GPUs) |
This will yield augmented feature lists like:
[user@login01.darwin ~]$ scontrol show node r1n00 | grep Features AvailableFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,standard,512GiB ActiveFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,standard,512GiB [user@login01.darwin ~]$ scontrol show node r1t00 | grep Features AvailableFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,PCI::GPU::T4,nvidia-gpu,nvidia-t4,t4,512GiB ActiveFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,PCI::GPU::T4,nvidia-gpu,nvidia-t4,t4,512GiB [user@login01.darwin ~]$ scontrol show node r2v00 | grep Features AvailableFeatures=VENDOR::GenuineIntel,MODEL::8260,CACHE::36608KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni,PCI::GPU::V100,nvidia-gpu,nvidia-v100,v100,768GiB ActiveFeatures=VENDOR::GenuineIntel,MODEL::8260,CACHE::36608KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni,PCI::GPU::V100,nvidia-gpu,nvidia-v100,v100,768GiB
For a user to submit a job that requires the AVX512 Byte-Word and AVX512 Foundational ISA extensions, the command would resemble this:
[user@login01.darwin ~]$ sbatch … --constrain='ISA::avx512f&ISA::avx512bw' …
The syntax for using multiple features in a constraint are documented in the sbatch man page.
Impact
The Slurm scheduler will be restarted to load the new plugin. Job submission and query (via sbatch, sacct, squeue for example) will hang for a period anticipated to be less than one minute.
Timeline
| Date | Time | Goal/Description |
|---|---|---|
| 2026-01-06 | Authoring of this document | |
| 2026-01-13 | 10:00 | Implementation |