abstract:darwin:earlyaccess

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

abstract:darwin:earlyaccess [2021-04-24 14:45] – [Local filesystems] anitaabstract:darwin:earlyaccess [2021-04-27 16:21] (current) – external edit 127.0.0.1
Line 87: Line 87:
 Each node scratch filesystem disk is only accessible by the node in which it is physically installed. The job scheduling system creates a temporary directory associated with each running job on this filesystem. When your job terminates, the job scheduler automatically erases that directory and its contents. Each node scratch filesystem disk is only accessible by the node in which it is physically installed. The job scheduling system creates a temporary directory associated with each running job on this filesystem. When your job terminates, the job scheduler automatically erases that directory and its contents.
  
-More detailed information about DARWIN storage can be found <html><span style="color:#ffffff;background-color:#2fa4e7;padding:3px 7px !important;border-radius:4px;">sidebar</span></html> under [[abstract:darwin:filesystems:filesystems|Storage]].+More detailed information about DARWIN storage and quotas can be found on the <html><span style="color:#ffffff;background-color:#2fa4e7;padding:3px 7px !important;border-radius:4px;">sidebar</span></html> under [[abstract:darwin:filesystems:filesystems|Storage]].
 ===== Software ===== ===== Software =====
  
-A list of installed software that IT builds and maintains for DARWIN users can be found by logging into DARWIN and using the VALET command ''vpkg_list''.+A list of installed software that IT builds and maintains for DARWIN users can be found by [[abstract:darwin:system_access:system_access#logging-on-to-caviness|logging into DARWIN]] and using the VALET command ''vpkg_list''
 + 
 +Documentation for all software is organized in alphabetical order on the <html><span style="color:#ffffff;background-color:#2fa4e7;padding:3px 7px !important;border-radius:4px;">sidebar</span></html> under [[software:software|Software]]. There will likely not be a details by cluster for DARWIN, however referring to Caviness should still be applicable for now.
  
 There will **not** be a full set of software during early access and testing, but we will be continually installing and updating software.  Installation priority will go to compilers, system libraries, and highly utilized software packages. Please DO let us know if there are packages that you would like to use on DARWIN, as that will help us prioritize user needs, but understand that we may not be able to install software requests in a timely manner.  There will **not** be a full set of software during early access and testing, but we will be continually installing and updating software.  Installation priority will go to compilers, system libraries, and highly utilized software packages. Please DO let us know if there are packages that you would like to use on DARWIN, as that will help us prioritize user needs, but understand that we may not be able to install software requests in a timely manner. 
Line 108: Line 110:
 ==== Queues (Partitions) ==== ==== Queues (Partitions) ====
  
-Initially for early access there are two partitions:+During Phase 2 early access partitions have been created to align with allocation requests moving forward based on different node types. There will be no default partition, and only specify one partition at a time.  It is not possible to specify multiple partitions using Slurm to span different node types.
  
-(1) ''standard'' partition (default) has the following limits +See [[abstract/darwin/runjobs/queues|Queues]] on the <html><span style="color:#ffffff;background-color:#2fa4e7;padding:3px 7px !important;border-radius:4px;">sidebar</span></html> for detailed information about the available partitions on DARWIN.
- +
-  * 7 day run time per job (default 30 minutes) +
-  * 576 cores per job (default 1 core) +
-  * 9 nodes per job +
-  * 1152 cores total per user +
-  * maximum 400 job submissions per user (this includes the number of indices specified for an array job) +
- +
-Beginning on **March 8 @ noon** the ''standard'' partition (default) will have the following limits +
- +
-  * 2 day run time per job (default 30 minutes) +
-  * no core limit per job or user (default 1 core) +
-  * maximum 400 job submissions per user (this includes the number of indices specified for an array job) +
- +
-(2) ''lg-swap'' partition to access the **Extended Memory** node with the following limits +
- +
-  * 7 day run time per job (default 30 minutes)+
  
 We fully expect these limits to be changed and adjusted during the early access period. We fully expect these limits to be changed and adjusted during the early access period.
Line 132: Line 118:
 ==== Run Jobs ==== ==== Run Jobs ====
  
-In order to schedule any job (interactively or batch) on the DARWIN cluster, you must set your workgroup to define your cluster group. For early access, everyone is in the same workgroup, **unsponsored**, so typing+In order to schedule any job (interactively or batch) on the DARWIN cluster, you must set your workgroup to define your cluster group. For Phase 2 early access, each research group has been assigned a unique workgroup. Each research group should have received this information in a welcome email for Phase 2 early access. For example,
  
 <code bash> <code bash>
-workgroup -g unsponsored+workgroup -g it_css
 </code> </code>
  
-accomplishes this You will know if you are in your workgroup based on the change in your bash prompt.  See the following example for user ''traine''+will enter the workgroup for ''it_css''. You will know if you are in your workgroup based on the change in your bash prompt.  See the following example for user ''traine''
  
 <code bash> <code bash>
-[traine@login00.darwin ~]$ workgroup -g unsponsored+[traine@login00.darwin ~]$ workgroup -g it_css
 [(unsponsored:traine)@login00.darwin ~]$ printenv USER HOME WORKDIR WORKGROUP WORKDIR_USER [(unsponsored:traine)@login00.darwin ~]$ printenv USER HOME WORKDIR WORKGROUP WORKDIR_USER
 traine traine
 /home/1201 /home/1201
-/lustre/unsponsored +/lustre/it_css 
-unsponsored +it_css 
-/lustre/unsponsored/users/1201 +/lustre/it_css/users/1201 
-[(unsponsored:anita)@login00.darwin ~]$+[(it_css:traine)@login00.darwin ~]$
 </code> </code>
  
-Now we can use ''salloc'' or ''sbatch'' to submit an interactive or batch job respectively.  See Caviness [[abstract:caviness:runjobs:runjobs|Run Jobs]], [[abstract:caviness:runjobs:schedule_jobs|Schedule Jobs]] and [[abstract:caviness:runjobs:job_status|Managing Jobs]] wiki pages for more help about Slurm including how to specify resources and check on the status of your jobs.+Now we can use ''salloc'' or ''sbatch'' as long as a [[abstract:darwin:runjobs:queues|partition]] is specified as well to submit an interactive or batch job respectively.  See DARWIN [[abstract:darwin:runjobs:runjobs|Run Jobs]], [[abstract:darwin:runjobs:schedule_jobs|Schedule Jobs]] and [[abstract:darwin:runjobs:job_status|Managing Jobs]] wiki pages for more help about Slurm including how to specify resources and check on the status of your jobs.
  
 <note important> <note important>
Line 157: Line 143:
 </note> </note>
  
-<note tip>It is a good idea to periodically check in ''/opt/templates/slurm/'' for updated or new [[technical:slurm:darwin:templates:start|templates]] to use as job scripts to run generic or specific applications, designed to provide the best performance on Caviness.</note> +<note tip>It is a good idea to periodically check in ''/opt/shared/templates/slurm/'' for updated or new [[technical:slurm:darwin:templates:start|templates]] to use as job scripts to run generic or specific applications, designed to provide the best performance on DARWIN.</note>
- +
-==== Memory ==== +
- +
-The table below provides the usable memory values available for each type of node currently available on the DARWIN.+
  
-^Node type                  ^Slurm selection options                                        ^RealMemory/MiB  ^RealMemory/GiB^ +See [[abstract/darwin/runjobs/|Run jobs]] on the <html><span style="color:#ffffff;background-color:#2fa4e7;padding:3px 7px !important;border-radius:4px;">sidebar</span></html> for detailed information about the running jobs on DARWIN and specifically [[abstract:darwin:runjobs:schedule_jobs#command-options|Schedule job options]] for memory, time, gpus, etc.
-|Standard/512 GiB           |%%--%%constraint='standard'                                    |    499712|       488| +
-|Large Memory/1 TiB         |%%--%%constraint='large-memory'                                |    999424|       976| +
-|Extra-Large Memory/2 TiB   |%%--%%constraint='xlarge-memory'                                 2031616|      1984| +
-|nVidia-T4/512 GiB          |%%--%%constraint='nvidia-t4' %%--%%gpus=tesla_t4:1                499712|       488| +
-|nVidia-V100/768 GiB        |%%--%%constraint='nvidia-v100' %%--%%gpus=tesla_v100:<N>          737280|       720| +
-|amd-MI50/512 GiB           |%%--%%constraint='amd-mi50' %%--%%gpus=amd_mi50:             |    499712|       488| +
-|Extended Memory/3.73 TiB   |%%--%%partition=lg-swap %%--exclusive%%                        |    999424|       976|+
  
-The **Extended Memory** node is not accessible via Slurm constraint or gres, but instead specifying the partition ''lg-swap'' and ''exclusive'' options.  This allows only one user on the node at a time thereby making all swap space accessible for multiple jobs running on that node at once, sharing the swap; but no other user can be on it during that time. 
 ===== Help ===== ===== Help =====
  
  • abstract/darwin/earlyaccess.1619289909.txt.gz
  • Last modified: 2021-04-24 14:45
  • by anita