Both sides previous revision Previous revision Next revision | Previous revision |
abstract:darwin:runjobs:schedule_jobs [2024-09-25 14:11] – [Array Jobs] anita | abstract:darwin:runjobs:schedule_jobs [2024-11-25 13:40] (current) – [Memory] anita |
---|
Specifying the correct node type and amount of memory is important because your allocation is [[abstract:darwin:runjobs:accounting|billed]] based on the Service Unit (SU) and each SU varies with the type of node and memory being used. If you specify a node type with a larger amount of memory then you are charged accordingly even if you don't use it. | Specifying the correct node type and amount of memory is important because your allocation is [[abstract:darwin:runjobs:accounting|billed]] based on the Service Unit (SU) and each SU varies with the type of node and memory being used. If you specify a node type with a larger amount of memory then you are charged accordingly even if you don't use it. |
| |
The table below provides the usable memory values available for each type of node currently available on the DARWIN. | Please see [[abstract:darwin:runjobs:queues#maximum-requestable-memory|maximum requestable memory]] based on partition (queue) specified and node type. |
| |
^Node type ^Slurm selection options ^RealMemory/MiB ^RealMemory/GiB^ | |
|Standard/512 GiB |%%--%%partition=standard | 499712| 488| | |
|Large Memory/1 TiB |%%--%%partition=large-mem | 999424| 976| | |
|Extra-Large Memory/2 TiB |%%--%%partition=xlarge-mem | 2031616| 1984| | |
|nVidia-T4/512 GiB |%%--%%partition=gpu-t4 | 499712| 488| | |
|nVidia-V100/768 GiB |%%--%%partition=gpu-v100 | 737280| 720| | |
|amd-MI50/512 GiB |%%--%%partition=gpu-mi50 | 499712| 488| | |
|Extended Memory/3.73 TiB |%%--%%partition=extended-mem %%--exclusive%% | 999424| 976| | |
| |
The **Extended Memory** is accessible by specifying the partition ''extended-mem'' and ''exclusive'' options. This allows only one user on the node at a time thereby making all swap space accessible for multiple jobs running on that node at once, sharing the swap; but no other user can be on it during that time. | |
| |
<note important>**VERY IMPORTANT:** Keep in mind that not all memory can be reserved for a node due to a small amount required for system use. As a result, the maximum amount of memory that can be specified is based on what Slurm shows as available. For example, the baseline nodes in DARWIN show a memory size of 488 GiB versus the 512 GiB of physical memory present in them. This means if you try to specify the full amount of memory (i.e. 512G) for the ''standard'' partition, then the job will be rejected. This will work if you specify a different partition with more memory. For example, | <note important>**VERY IMPORTANT:** Keep in mind that not all memory can be reserved for a node due to a small amount required for system use. As a result, the maximum amount of memory that can be specified is based on what Slurm shows as available. For example, the baseline nodes in DARWIN show a memory size of 488 GiB versus the 512 GiB of physical memory present in them. This means if you try to specify the full amount of memory (i.e. 512G) for the ''standard'' partition, then the job will be rejected. This will work if you specify a different partition with more memory. For example, |
[(it_css:treine)@r2l00 ~]$ | [(it_css:treine)@r2l00 ~]$ |
</code> | </code> |
| You may also use ''%%--%%mem=0'' to request all the memory on a node. |
</note> | </note> |
| |
Four sub-tasks are executed, numbered from 1 through 4. The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index. The //step size// going from one index to the next defaults to one, but can be any positive integer greater than zero. A step size is appended to the sub-task range as in ''2-20:2'' -- proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al. | Four sub-tasks are executed, numbered from 1 through 4. The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index. The //step size// going from one index to the next defaults to one, but can be any positive integer greater than zero. A step size is appended to the sub-task range as in ''2-20:2'' -- proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al. |
| |
<note important>The default [[technical:slurm/arraysize-and-nodecounts#job-array-size-limits|job array size limits]] for Slurm are used on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). | <note important>The default job array size limits are set to 10000 for Slurm on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). See the [[technical:slurm:caviness:arraysize-and-nodecounts#job-array-size-limits|technical explanation]] for why this is necessary. |
</note> | </note> |
==== Partitioning Job Data ==== | ==== Partitioning Job Data ==== |