Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
technical:slurm:node-memory-sizes [2019-02-18 12:15] – [Proposed RealMemory sizes] frey | technical:slurm:node-memory-sizes [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Revisions to Slurm Configuration v1.1.3 on Caviness ====== | ||
- | This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster. | ||
- | |||
- | ===== Issues ===== | ||
- | |||
- | ==== Nominal node memory size is not an appropriate limit ==== | ||
- | |||
- | When the [[technical: | ||
- | |||
- | < | ||
- | Reason=Low RealMemory | ||
- | </ | ||
- | |||
- | Each node runs a Slurm job execution daemon (slurmd) that reports back to the scheduler every few minutes; included in that report are the base resource levels: | ||
- | |||
- | <WRAP negative round> | ||
- | Slurm // | ||
- | </ | ||
- | |||
- | Many nodes transitioned to the DRAIN state within the first 30 minutes after the v1.1.3 changes were activated: | ||
- | |||
- | The changes did not need to be rolled-back, | ||
- | |||
- | One additional problem could present itself under the v1.1.3 use of nominal physical memory size for the nodes Consider the following: | ||
- | |||
- | * A node runs a job requesting 28 cores and 100 GiB of memory, leaving 8 cores and 28 GiB of memory available according to the node configuration. | ||
- | * A second job from a different user, requesting 4 cores and 28 GiB of memory, is scheduled on the node. | ||
- | |||
- | Since the OS itself occupies some non-trivial amount of the physical memory, the second job eventually extends memory usage above and beyond the amount of physical memory present. | ||
- | |||
- | <WRAP negative round> | ||
- | Choosing to use the nominal memory size of each node for its RealMemory limit was meant to keep requests like '' | ||
- | </ | ||
- | |||
- | ==== FastSchedule requires explicit specification of all resources ==== | ||
- | |||
- | In previous configurations, | ||
- | |||
- | < | ||
- | $ scontrol show node r00n22 | ||
- | NodeName=r00n22 Arch=x86_64 CoresPerSocket=18 | ||
- | : | ||
- | | ||
- | : | ||
- | </ | ||
- | |||
- | Any user submitting a job which requests a minimum amount of ''/ | ||
- | |||
- | <WRAP negative round> | ||
- | Slurm // | ||
- | </ | ||
- | |||
- | This situation was addressed by augmenting the node configurations with explicit TmpDisk values shortly after the v1.1.3 configuration was initially activated. | ||
- | |||
- | ===== Solutions ===== | ||
- | |||
- | ==== Use realistic RealMemory levels ==== | ||
- | |||
- | For each type of node present in Caviness, a RealMemory size less than that reported by slurmd (to prevent DRAIN state transitions) will be chosen. | ||
- | |||
- | <WRAP positive round> | ||
- | Node configurations will be updated to reflect the chosen sub-nominal RealMemory sizes. | ||
- | </ | ||
- | |||
- | Under mode 1 of // | ||
- | |||
- | <note important> | ||
- | |||
- | <WRAP positive round> | ||
- | Workgroup QOS configurations will be updated to reflect the sum over sub-nominal RealMemory sizes rather than nominal sizes used in the v1.1.3 configuration. | ||
- | </ | ||
- | |||
- | In v1.1.3 the node counts in workgroup QOS's were replaced by aggregate memory sizes which summed over the nominal sizes (128 GiB, 256 GiB, 512 GiB). In concert with changing the nodes' RealMemory size, the QOS aggregate must change. | ||
- | |||
- | ==== Proposed RealMemory sizes ==== | ||
- | |||
- | ^Node type^(PHYS_PAGES*PAGESIZE)/ | ||
- | |Gen1/128 GiB|128813|126976|124| | ||
- | |Gen1/256 GiB|257843|256000|250| | ||
- | |Gen1/512 GiB|515891|514048|502| | ||
- | |Gen1/ | ||
- | |Gen1/ | ||
- | |Gen1/ | ||
- | |Gen1/ | ||
- | |||
- | A workgroup QOS which under v1.1.3 had '' | ||
- | ===== Implementation ===== | ||
- | |||
- | All changes are effected by altering the Slurm configuration files, pushing the changed files to all nodes, and signaling a change in configuration so all daemons refresh their configuration. | ||
- | |||
- | ===== Impact ===== | ||
- | |||
- | No downtime is expected to be required. | ||
- | |||
- | ===== Timeline ===== | ||
- | |||
- | ^Date ^Time ^Goal/ | ||
- | |2019-02-18| |Authoring of this document| | ||
- | |2019-02-18| |Document shared with Caviness community for feedback| | ||
- | |2019-02-18| |Add announcement of impending change to login banner| | ||
- | |2019-02-25|09: | ||
- | | |09: | ||
- | |2019-02-27| |Remove announcement from login banner| |