abstract:darwin:runjobs:schedule_jobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:darwin:runjobs:schedule_jobs [2024-09-25 10:39] – [Job Templates] anitaabstract:darwin:runjobs:schedule_jobs [2024-11-25 13:40] (current) – [Memory] anita
Line 228: Line 228:
 Specifying the correct node type and amount of memory is important because your allocation is [[abstract:darwin:runjobs:accounting|billed]] based on the Service Unit (SU) and each SU varies with the type of node and memory being used. If you specify a node type with a larger amount of memory then you are charged accordingly even if you don't use it. Specifying the correct node type and amount of memory is important because your allocation is [[abstract:darwin:runjobs:accounting|billed]] based on the Service Unit (SU) and each SU varies with the type of node and memory being used. If you specify a node type with a larger amount of memory then you are charged accordingly even if you don't use it.
  
-The table below provides the usable memory values available for each type of node currently available on the DARWIN. +Please see [[abstract:darwin:runjobs:queues#maximum-requestable-memory|maximum requestable memory]] based on partition (queue) specified and node type.
- +
-^Node type                  ^Slurm selection options                                    ^RealMemory/MiB  ^RealMemory/GiB^ +
-|Standard/512 GiB           |%%--%%partition=standard                                      499712|       488| +
-|Large Memory/1 TiB         |%%--%%partition=large-mem                                  |    999424|       976| +
-|Extra-Large Memory/2 TiB   |%%--%%partition=xlarge-mem                                   2031616|      1984| +
-|nVidia-T4/512 GiB          |%%--%%partition=gpu-t4                                        499712|       488| +
-|nVidia-V100/768 GiB        |%%--%%partition=gpu-v100                                      737280|       720| +
-|amd-MI50/512 GiB           |%%--%%partition=gpu-mi50                                      499712|       488| +
-|Extended Memory/3.73 TiB   |%%--%%partition=extended-mem %%--exclusive%%                  999424|       976| +
- +
-The **Extended Memory** is accessible by specifying the partition ''extended-mem'' and ''exclusive'' options.  This allows only one user on the node at a time thereby making all swap space accessible for multiple jobs running on that node at once, sharing the swap; but no other user can be on it during that time.+
  
 <note important>**VERY IMPORTANT:** Keep in mind that not all memory can be reserved for a node due to a small amount required for system use.  As a result, the maximum amount of memory that can be specified is based on what Slurm shows as available. For example, the baseline nodes in DARWIN show a memory size of 488 GiB versus the 512 GiB of physical memory present in them. This means if you try to specify the full amount of memory (i.e. 512G) for the ''standard'' partition, then the job will be rejected. This will work if you specify a different partition with more memory. For example, <note important>**VERY IMPORTANT:** Keep in mind that not all memory can be reserved for a node due to a small amount required for system use.  As a result, the maximum amount of memory that can be specified is based on what Slurm shows as available. For example, the baseline nodes in DARWIN show a memory size of 488 GiB versus the 512 GiB of physical memory present in them. This means if you try to specify the full amount of memory (i.e. 512G) for the ''standard'' partition, then the job will be rejected. This will work if you specify a different partition with more memory. For example,
Line 254: Line 243:
 [(it_css:treine)@r2l00 ~]$ [(it_css:treine)@r2l00 ~]$
 </code>  </code> 
 +You may also use ''%%--%%mem=0'' to request all the memory on a node.
 </note> </note>
  
Line 722: Line 711:
  
 --array=1,2,5,19,27 --array=1,2,5,19,27
 +
 +<note important>The default job array size limits are set to 10000 for Slurm on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). See the [[technical:slurm:caviness:arraysize-and-nodecounts#job-array-size-limits|technical explanation]] for why this is necessary.
 +</note>
 +
 +For more details and information see [[abstract:darwin:runjobs:schedule_jobs#array-jobs1|Array Jobs]].
 ===== Chaining Jobs ===== ===== Chaining Jobs =====
  
Line 961: Line 955:
 Four sub-tasks are executed, numbered from 1 through 4.  The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index.  The //step size// going from one index to the next defaults to one, but can be any positive integer greater than zero.  A step size is appended to the sub-task range as in ''2-20:2'' -- proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al. Four sub-tasks are executed, numbered from 1 through 4.  The starting index must be greater than zero, and the ending index must be greater than or equal to the starting index.  The //step size// going from one index to the next defaults to one, but can be any positive integer greater than zero.  A step size is appended to the sub-task range as in ''2-20:2'' -- proceed from 2 up to 20 in steps of 2, e.g. 2, 4, 6, 8, 10, et al.
  
-<note important>The default [[technical:slurm/arraysize-and-nodecounts#job-array-size-limits|job array size limits]] for Slurm are used on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). +<note important>The default job array size limits are set to 10000 for Slurm on DARWIN to avoid oversubscribing the scheduler node's own resource limits (causing scheduling to become sluggish or even unresponsive). See the [[technical:slurm:caviness:arraysize-and-nodecounts#job-array-size-limits|technical explanation]] for why this is necessary.
 </note> </note>
 ==== Partitioning Job Data ==== ==== Partitioning Job Data ====
  • abstract/darwin/runjobs/schedule_jobs.1727275164.txt.gz
  • Last modified: 2024-09-25 10:39
  • by anita