technical:slurm:darwin:synth_features

Revision to Slurm node features

This document summarizes the addition of automatically-gathered features associated with nodes, which users can leverage to better-constrain on what nodes a job executes.

Every compute node registered in Slurm has a list of zero or more features – strings that identify a functionality, identity, or other attribute associated with the node. On DARWIN, all compute nodes have always been statically-configured with a set of generational, CPU, and nominal features. Take, for example, these DARWIN nodes:

[user@login01.darwin ~]$ scontrol show node r1n00 | grep Features
   AvailableFeatures=standard,512GiB
   ActiveFeatures=standard,512GiB
 
[user@login01.darwin ~]$ scontrol show node r1t00 | grep Features
   AvailableFeatures=nvidia-gpu,nvidia-t4,t4,512GiB
   ActiveFeatures=nvidia-gpu,nvidia-t4,t4,512GiB
 
[user@login01.darwin ~]$ scontrol show node r2v00 | grep Features
   AvailableFeatures=nvidia-gpu,nvidia-v100,v100,768GiB
   ActiveFeatures=nvidia-gpu,nvidia-v100,v100,768GiB

A user can limit which nodes are permissible for a submitted job:

[user@login01.darwin ~]$ sbatch --constraint=512GiB …

would mean r1n00 or r1t00 could be used to execute the job but r2v00 would not.

While these existing features can be useful, they do not directly assist in choosing nodes based on the hardware capabilities. Some software may demand a CPU with AVX512 ISA extensions, but Slurm does not inherently know whether or not a node's CPU has that capability, nor do our existing features directly indicate it.

A list of all ISA extensions supported by a CPU is present in a Linux system's /proc/cpuinfo file. It would be helpful if the list of statically-configured features that have always existed is augmented by additional features added dynamically by the Slurm software running on the compute node.

A Slurm plugin has been written by IT-RCI staff that synthesizes additional features by consulting a node's /proc/cpuinfo file. All features synthesized by the plugin are formatted as <TYPE>::<VALUE>, where the possible <TYPE> values are:

Type Description
VENDOR CPU vendor name (e.g. GenuineIntel or AuthenticAMD)
MODEL succinct CPU model name extracted from the verbose name
CACHE kilobytes of cache reported by the CPU
ISA available ISA extensions (e.g. avx512f or sse4_1)
PCI special PCI devices (e.g. GPUs)

This will yield augmented feature lists like:

[user@login01.darwin ~]$ scontrol show node r1n00 | grep Features
   AvailableFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,standard,512GiB
   ActiveFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,standard,512GiB
 
[user@login01.darwin ~]$ scontrol show node r1t00 | grep Features
   AvailableFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,PCI::GPU::T4,nvidia-gpu,nvidia-t4,t4,512GiB
   ActiveFeatures=VENDOR::AuthenticAMD,MODEL::EPYC_7502,CACHE::512KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,PCI::GPU::T4,nvidia-gpu,nvidia-t4,t4,512GiB
 
[user@login01.darwin ~]$ scontrol show node r2v00 | grep Features
   AvailableFeatures=VENDOR::GenuineIntel,MODEL::8260,CACHE::36608KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni,PCI::GPU::V100,nvidia-gpu,nvidia-v100,v100,768GiB
   ActiveFeatures=VENDOR::GenuineIntel,MODEL::8260,CACHE::36608KB,ISA::sse,ISA::sse2,ISA::ssse3,ISA::sse4_1,ISA::sse4_2,ISA::avx,ISA::avx2,ISA::avx512f,ISA::avx512dq,ISA::avx512cd,ISA::avx512bw,ISA::avx512vl,ISA::avx512_vnni,PCI::GPU::V100,nvidia-gpu,nvidia-v100,v100,768GiB

For a user to submit a job that requires the AVX512 Byte-Word and AVX512 Foundational ISA extensions, the command would resemble this:

[user@login01.darwin ~]$ sbatch … --constrain='ISA::avx512f&ISA::avx512bw'

The syntax for using multiple features in a constraint are documented in the sbatch man page.

The Slurm scheduler will be restarted to load the new plugin. Job submission and query (via sbatch, sacct, squeue for example) will hang for a period anticipated to be less than one minute.

DateTimeGoal/Description
2026-01-06 Authoring of this document
2026-01-1310:00Implementation
  • technical/slurm/darwin/synth_features.txt
  • Last modified: 2026-01-06 10:21
  • by frey