Revision to Slurm node features
This document summarizes the addition of automatically-gathered features associated with nodes, which users can leverage to better-constrain on what nodes a job executes.
Issues
Every compute node registered in Slurm has a list of zero or more features – strings that identify a functionality, identity, or other attribute associated with the node. On Caviness, all compute nodes have always been statically-configured with a set of generational, CPU, and nominal features. Take, for example, these two Caviness nodes:
[user@login01.caviness ~]$ scontrol show node r00n14 | grep Features AvailableFeatures=Gen1,E5-2695,E5-2695v4,128GB ActiveFeatures=Gen1,E5-2695,E5-2695v4,128GB [user@login01.caviness ~]$ scontrol show node r05n14 | grep Features AvailableFeatures=Gen3,Intel,Gold-6240R,6240R,384GB ActiveFeatures=Gen3,Intel,Gold-6240R,6240R,384GB A user can limit which nodes are permissible for a submitted job: <code bash> [user@login01.caviness ~]$ sbatch --constraint=Gen3 …
would mean r05n14 could be used to execute the job but r00n14 could not.
While these existing features can be useful, they do not directly assist in choosing nodes based on the hardware capabilities. Some software may demand a CPU with AVX512 ISA extensions, but Slurm does not inherently know whether or not a node's CPU has that capability, nor do our existing features directly indicate it. The Intel 6240R processor does implement AVX512 ISA extensions, so the user might be tempted to use the –constraint=Gen3 option when submitting the job. This would work fine unless a Gen3 GPU node were selected: the AMD CPUs in those nodes do not have AVX512 capabilities.
A list of all ISA extensions supported by a CPU is present in a Linux system's /proc/cpuinfo file. It would be helpful if the list of statically-configured features that have always existed is augmented by additional features added dynamically by the Slurm software running on the compute node.
Implementation
A Slurm plugin has been written by IT-RCI staff that synthesizes additional features by consulting a node's /proc/cpuinfo file. All features synthesized by the plugin are formatted as <TYPE>::<VALUE>, where the possible <TYPE> values are:
| Type | Description |
|---|---|
VENDOR | CPU vendor name (e.g. GenuineIntel or AuthenticAMD) |
MODEL | succinct CPU model name extracted from the verbose name |
CACHE | kilobytes of cache reported by the CPU |
ISA | available ISA extensions (e.g. avx512f or sse4_1) |
PCI | special PCI devices (e.g. GPUs) |
This will yield augmented feature lists like:
[user@login01.caviness ~]$ scontrol show node r00n14 | grep Features AvailableFeatures=Gen1,E5-2695,E5-2695v4,128GB,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2 ActiveFeatures=Gen1,E5-2695,E5-2695v4,128GB,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2 [user@login01.caviness ~]$ scontrol show node r00g00 | grep Features AvailableFeatures=Gen1,E5-2695,E5-2695v4,128GB,PCI::GPU::P100,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2 ActiveFeatures=Gen1,E5-2695,E5-2695v4,128GB,PCI::GPU::P100,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2 [user@login01.caviness ~]$ scontrol show node r05n14 | grep Features AvailableFeatures=Gen3,Intel,Gold-6240R,6240R,384GB,VENDOR::GenuineIntel,MODEL::Gold_6230,CACHE::28160KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni ActiveFeatures=Gen3,Intel,Gold-6240R,6240R,384GB,VENDOR::GenuineIntel,MODEL::Gold_6230,CACHE::28160KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni
For a user to submit a job that requires the AVX512 Byte-Word and AVX512 Foundational ISA extensions, the command would resemble this:
[user@login01.caviness ~]$ sbatch … --constrain='ISA::avx512f&ISA::avx512bw' …
To further qualify the hardware selection, the existing Gen2 feature could still be used, for example:
[user@login01.caviness ~]$ sbatch … --constrain='Gen2&ISA::avx512f&ISA::avx512bw' …
The syntax for using multiple features in a constraint are documented in the sbatch man page.
Impact
The Slurm scheduler will be restarted to load the updated job submission plugin. Job submission and query (via sbatch, sacct, squeue for example) will hang for a period anticipated to be less than one minute.
Timeline
| Date | Time | Goal/Description |
|---|---|---|
| 2025-11-13 | Authoring of this document | |
| 2025-12-01 | 10:00 | Implementation |