Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
technical:slurm:node-memory-sizes [2019-02-18 11:34] – frey | technical:slurm:node-memory-sizes [Unknown date] (current) – removed - external edit (Unknown date) 127.0.0.1 | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ====== Revisions to Slurm Configuration v1.1.3 on Caviness ====== | ||
- | This document summarizes alterations to the Slurm job scheduler configuration on the Caviness cluster. | ||
- | |||
- | ===== Issues ===== | ||
- | |||
- | ==== Nominal node memory size is not an appropriate limit ==== | ||
- | |||
- | When the [[technical: | ||
- | |||
- | < | ||
- | Reason=Low RealMemory | ||
- | </ | ||
- | |||
- | Each node runs a Slurm job execution daemon (slurmd) that reports back to the scheduler every few minutes; included in that report are the base resource levels: | ||
- | |||
- | <WRAP negative round> | ||
- | Slurm // | ||
- | </ | ||
- | |||
- | Many nodes transitioned to the DRAIN state within the first 30 minutes after the v1.1.3 changes were activated: | ||
- | |||
- | The changes did not need to be rolled-back, | ||
- | |||
- | One additional problem could present itself under the v1.1.3 use of nominal physical memory size for the nodes Consider the following: | ||
- | |||
- | * A node runs a job requesting 28 cores and 100 GiB of memory, leaving 8 cores and 28 GiB of memory available according to the node configuration. | ||
- | * A second job from a different user, requesting 4 cores and 28 GiB of memory, is scheduled on the node. | ||
- | |||
- | Since the OS itself occupies some non-trivial amount of the physical memory, the second job eventually extends memory usage above and beyond the amount of physical memory present. | ||
- | |||
- | <WRAP negative round> | ||
- | Choosing to use the nominal memory size of each node for its RealMemory limit was meant to keep requests like '' | ||
- | </ | ||
- | |||
- | ==== FastSchedule requires explicit specification of all resources ==== | ||
- | |||
- | In previous configurations, | ||
- | |||
- | < | ||
- | $ scontrol show node r00n22 | ||
- | NodeName=r00n22 Arch=x86_64 CoresPerSocket=18 | ||
- | : | ||
- | | ||
- | : | ||
- | </ | ||
- | |||
- | Any user submitting a job which requests a minimum amount of ''/ | ||
- | |||
- | <WRAP negative round> | ||
- | Slurm // | ||
- | </ | ||
- | |||
- | This situation was addressed by augmenting the node configurations with explicit TmpDisk values shortly after the v1.1.3 configuration was initially activated. | ||
- | |||
- | ===== Solutions ===== | ||
- | |||
- | ==== Use realistic RealMemory levels ==== | ||
- | |||
- | For each type of node present in Caviness, a RealMemory size less than that reported by slurmd (to prevent DRAIN state transitions) will be chosen. | ||
- | |||
- | <WRAP positive round> | ||
- | Node configurations will be updated to reflect the chosen sub-nominal RealMemory sizes. | ||
- | </ | ||
- | |||
- | The // | ||
- | |||
- | <note important> | ||
- | |||
- | ===== Implementation ===== | ||
- | |||
- | All changes are effected by altering the Slurm configuration files, pushing the changed files to all nodes, and signaling a change in configuration so all daemons refresh their configuration. | ||
- | |||
- | ===== Impact ===== | ||
- | |||
- | No downtime is expected to be required. | ||
- | |||
- | ===== Timeline ===== | ||
- | |||
- | ^Date ^Time ^Goal/ | ||
- | |2019-02-04| |Authoring of this document| | ||
- | |2019-02-06| |Document shared with Caviness community for feedback| | ||
- | |2019-02-13| |Add announcement of impending change to login banner| | ||
- | |2019-02-18|09: | ||
- | | |09: | ||
- | |2019-02-20| |Remove announcement from login banner| |