Revision to Slurm node features

This document summarizes the addition of automatically-gathered features associated with nodes, which users can leverage to better-constrain on what nodes a job executes.

Every compute node registered in Slurm has a list of zero or more features – strings that identify a functionality, identity, or other attribute associated with the node. On Caviness, all compute nodes have always been statically-configured with a set of generational, CPU, and nominal features. Take, for example, these two Caviness nodes:

[user@login01.caviness ~]$ scontrol show node r00n14 | grep Features
   AvailableFeatures=Gen1,E5-2695,E5-2695v4,128GB
   ActiveFeatures=Gen1,E5-2695,E5-2695v4,128GB
 
[user@login01.caviness ~]$ scontrol show node r05n14 | grep Features
   AvailableFeatures=Gen3,Intel,Gold-6240R,6240R,384GB
   ActiveFeatures=Gen3,Intel,Gold-6240R,6240R,384GB
A user can limit which nodes are permissible for a submitted job:
<code bash>
[user@login01.caviness ~]$ sbatch --constraint=Gen3 …

would mean r05n14 could be used to execute the job but r00n14 could not.

While these existing features can be useful, they do not directly assist in choosing nodes based on the hardware capabilities. Some software may demand a CPU with AVX512 ISA extensions, but Slurm does not inherently know whether or not a node's CPU has that capability, nor do our existing features directly indicate it. The Intel 6240R processor does implement AVX512 ISA extensions, so the user might be tempted to use the –constraint=Gen3 option when submitting the job. This would work fine unless a Gen3 GPU node were selected: the AMD CPUs in those nodes do not have AVX512 capabilities.

A list of all ISA extensions supported by a CPU is present in a Linux system's /proc/cpuinfo file. It would be helpful if the list of statically-configured features that have always existed is augmented by additional features added dynamically by the Slurm software running on the compute node.

A Slurm plugin has been written by IT-RCI staff that synthesizes additional features by consulting a node's /proc/cpuinfo file. All features synthesized by the plugin are formatted as <TYPE>::<VALUE>, where the possible <TYPE> values are:

Type Description
VENDOR CPU vendor name (e.g. GenuineIntel or AuthenticAMD)
MODEL succinct CPU model name extracted from the verbose name
CACHE kilobytes of cache reported by the CPU
ISA available ISA extensions (e.g. avx512f or sse4_1)
PCI special PCI devices (e.g. GPUs)

This will yield augmented feature lists like:

[user@login01.caviness ~]$ scontrol show node r00n14 | grep Features
   AvailableFeatures=Gen1,E5-2695,E5-2695v4,128GB,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2
   ActiveFeatures=Gen1,E5-2695,E5-2695v4,128GB,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2
 
[user@login01.caviness ~]$ scontrol show node r00g00 | grep Features
   AvailableFeatures=Gen1,E5-2695,E5-2695v4,128GB,PCI::GPU::P100,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2
   ActiveFeatures=Gen1,E5-2695,E5-2695v4,128GB,PCI::GPU::P100,VENDOR::GenuineIntel,MODEL::E5-2695_v4,CACHE::46080KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2
 
[user@login01.caviness ~]$ scontrol show node r05n14 | grep Features
   AvailableFeatures=Gen3,Intel,Gold-6240R,6240R,384GB,VENDOR::GenuineIntel,MODEL::Gold_6230,CACHE::28160KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni
   ActiveFeatures=Gen3,Intel,Gold-6240R,6240R,384GB,VENDOR::GenuineIntel,MODEL::Gold_6230,CACHE::28160KB,ISA::sse,ISA::sse2,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni

For a user to submit a job that requires the AVX512 Byte-Word and AVX512 Foundational ISA extensions, the command would resemble this:

[user@login01.caviness ~]$ sbatch … --constrain='ISA::avx512f&ISA::avx512bw'

To further qualify the hardware selection, the existing Gen2 feature could still be used, for example:

[user@login01.caviness ~]$ sbatch … --constrain='Gen2&ISA::avx512f&ISA::avx512bw'

The syntax for using multiple features in a constraint are documented in the sbatch man page.

The Slurm scheduler will be restarted to load the updated job submission plugin. Job submission and query (via sbatch, sacct, squeue for example) will hang for a period anticipated to be less than one minute.

DateTimeGoal/Description
2025-11-13 Authoring of this document
2025-12-0110:00Implementation
  • technical/slurm/caviness/synth_features.txt
  • Last modified: 2025-11-19 13:53
  • by frey