Both sides previous revision Previous revision Next revision | Previous revision |
abstract:darwin:runjobs:schedule_jobs [2024-09-25 10:36] – [Chaining jobs] anita | abstract:darwin:runjobs:schedule_jobs [2024-11-25 13:40] (current) – [Memory] anita |
---|
Specifying the correct node type and amount of memory is important because your allocation is [[abstract:darwin:runjobs:accounting|billed]] based on the Service Unit (SU) and each SU varies with the type of node and memory being used. If you specify a node type with a larger amount of memory then you are charged accordingly even if you don't use it. | Specifying the correct node type and amount of memory is important because your allocation is [[abstract:darwin:runjobs:accounting|billed]] based on the Service Unit (SU) and each SU varies with the type of node and memory being used. If you specify a node type with a larger amount of memory then you are charged accordingly even if you don't use it. |
| |
The table below provides the usable memory values available for each type of node currently available on the DARWIN. | Please see [[abstract:darwin:runjobs:queues#maximum-requestable-memory|maximum requestable memory]] based on partition (queue) specified and node type. |
| |
^Node type ^Slurm selection options ^RealMemory/MiB ^RealMemory/GiB^ | |
|Standard/512 GiB |%%--%%partition=standard | 499712| 488| | |
|Large Memory/1 TiB |%%--%%partition=large-mem | 999424| 976| | |
|Extra-Large Memory/2 TiB |%%--%%partition=xlarge-mem | 2031616| 1984| | |
|nVidia-T4/512 GiB |%%--%%partition=gpu-t4 | 499712| 488| | |
|nVidia-V100/768 GiB |%%--%%partition=gpu-v100 | 737280| 720| | |
|amd-MI50/512 GiB |%%--%%partition=gpu-mi50 | 499712| 488| | |
|Extended Memory/3.73 TiB |%%--%%partition=extended-mem %%--exclusive%% | 999424| 976| | |
| |
The **Extended Memory** is accessible by specifying the partition ''extended-mem'' and ''exclusive'' options. This allows only one user on the node at a time thereby making all swap space accessible for multiple jobs running on that node at once, sharing the swap; but no other user can be on it during that time. | |
| |
<note important>**VERY IMPORTANT:** Keep in mind that not all memory can be reserved for a node due to a small amount required for system use. As a result, the maximum amount of memory that can be specified is based on what Slurm shows as available. For example, the baseline nodes in DARWIN show a memory size of 488 GiB versus the 512 GiB of physical memory present in them. This means if you try to specify the full amount of memory (i.e. 512G) for the ''standard'' partition, then the job will be rejected. This will work if you specify a different partition with more memory. For example, | <note important>**VERY IMPORTANT:** Keep in mind that not all memory can be reserved for a node due to a small amount required for system use. As a result, the maximum amount of memory that can be specified is based on what Slurm shows as available. For example, the baseline nodes in DARWIN show a memory size of 488 GiB versus the 512 GiB of physical memory present in them. This means if you try to specify the full amount of memory (i.e. 512G) for the ''standard'' partition, then the job will be rejected. This will work if you specify a different partition with more memory. For example, |
[(it_css:treine)@r2l00 ~]$ | [(it_css:treine)@r2l00 ~]$ |
</code> | </code> |
| You may also use ''%%--%%mem=0'' to request all the memory on a node. |
</note> | </note> |
| |
| |
--array=1,2,5,19,27 | --array=1,2,5,19,27 |
| |
| <note important>The default job array size limits are set to 10000 for Slurm on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). See the [[technical:slurm:caviness:arraysize-and-nodecounts#job-array-size-limits|technical explanation]] for why this is necessary. |
| </note> |
| |
| For more details and information see [[abstract:darwin:runjobs:schedule_jobs#array-jobs1|Array Jobs]]. |
===== Chaining Jobs ===== | ===== Chaining Jobs ===== |
| |
</code> | </code> |
| |
The directory layout is self-explanatory: script templates specific for all MPI jobs can be found in the ''mpi'' directory; Open MPI is in the ''openmpi'' directory, generic MPI in the ''generic'' directory, and MPICH can be found in the ''mpich'' directory (all under the ''mpi'' directory; a template for serial jobs is ''serial.qs'' and ''threads.qs'' should be used for OpenMP jobs. These scripts are heavily documented to aid in users' choice of appropriate templates and are updated as we uncover best practices and performance issues. Please copy a script templates for new projects rather than potentially using an older version from a previous project. See [[technical:slurm:templates:start|DARWIN Slurm Job Script Templates]] for more details. | The directory layout is self-explanatory: script templates specific for all MPI jobs can be found in the ''mpi'' directory; Open MPI is in the ''openmpi'' directory, generic MPI in the ''generic'' directory, and MPICH can be found in the ''mpich'' directory (all under the ''mpi'' directory; a template for serial jobs is ''serial.qs'' and ''threads.qs'' should be used for OpenMP jobs. These scripts are heavily documented to aid in users' choice of appropriate templates and are updated as we uncover best practices and performance issues. Please copy a script templates for new projects rather than potentially using an older version from a previous project. See [[technical:slurm:darwin:templates:start|DARWIN Slurm Job Script Templates]] for more details. |
| |
Need help? See [[http://www.hpc.udel.edu/presentations/intro_to_slurm/|Introduction to Slurm]] in UD's HPC community cluster environment. | Need help? See [[http://www.hpc.udel.edu/presentations/intro_to_slurm/|Introduction to Slurm]] in UD's HPC community cluster environment. |
Four sub-tasks are executed, numbered from 1 through 4. The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index. The //step size// going from one index to the next defaults to one, but can be any positive integer greater than zero. A step size is appended to the sub-task range as in ''2-20:2'' -- proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al. | Four sub-tasks are executed, numbered from 1 through 4. The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index. The //step size// going from one index to the next defaults to one, but can be any positive integer greater than zero. A step size is appended to the sub-task range as in ''2-20:2'' -- proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al. |
| |
<note important>The default [[technical:slurm/arraysize-and-nodecounts#job-array-size-limits|job array size limits]] for Slurm are used on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). | <note important>The default job array size limits are set to 10000 for Slurm on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). See the [[technical:slurm:caviness:arraysize-and-nodecounts#job-array-size-limits|technical explanation]] for why this is necessary. |
</note> | </note> |
==== Partitioning Job Data ==== | ==== Partitioning Job Data ==== |