abstract:darwin:filesystems:filesystems

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:darwin:filesystems:filesystems [2021-04-28 11:48] – [Home filesystem] anitaabstract:darwin:filesystems:filesystems [2021-10-14 11:19] (current) – [Workgroup storage] anita
Line 3: Line 3:
 ===== Home filesystem ===== ===== Home filesystem =====
  
-The 65 TB filesystem uses 3 TB enterprise class SATA drives in a triple-parity RAID configuration for high reliability and availability. The filesystem is accessible to the head node via 10 Gbit/s Ethernet and to the compute nodes via 1 Gbit/Ethernet.+The 13.5 TiB filesystem uses 960 GiB enterprise class SSD drives in a triple-parity RAID configuration for high reliability and availability. The filesystem is accessible to all nodes via IPoIB on the 100 Gbit/InfiniBand network.
  
 ==== Home storage ==== ==== Home storage ====
  
-Each user has 20 GB of disk storage reserved for personal use on the home file system. Users' home directories are in /home  (e.g., ''/home/1005''), and the directory name is put in the environment variable ''$HOME'' at login. +Each user has 20 GB of disk storage reserved for personal use on the home filesystem. Users' home directories are in /home  (e.g., ''/home/1005''), and the directory name is put in the environment variable ''$HOME'' at login. 
-===== High-performance filesystem =====+===== High-performance Lustre filesystem =====
  
 Lustre is designed to use parallel I/O techniques to reduce file-access time. The Lustre filesystems in use at UD are composed of many physical disks using RAID technologies to give resilience, data integrity, and parallelism at multiple levels. There is approximately 1.1 PiB of Lustre storage available on DARWIN. It uses high-bandwidth interconnects such as Mellanox HDR100. Lustre should be used for storing input files, supporting data files, work files, and output files associated with computational tasks run on the cluster. Lustre is designed to use parallel I/O techniques to reduce file-access time. The Lustre filesystems in use at UD are composed of many physical disks using RAID technologies to give resilience, data integrity, and parallelism at multiple levels. There is approximately 1.1 PiB of Lustre storage available on DARWIN. It uses high-bandwidth interconnects such as Mellanox HDR100. Lustre should be used for storing input files, supporting data files, work files, and output files associated with computational tasks run on the cluster.
-==== Workgroup Lustre storage ====+==== Workgroup storage ====
  
-Allocation workgroup storage is available on a [[abstract:darwin:filesystems:lustre|high-performance Lustre-based filesystem]] having almost 1 PB of usable space. Users should have a basic understanding of the concepts of [[abstract:darwin:filesystems:lustre|Lustre]] to take full advantage of this filesystem.+Allocation workgroup storage is available on a [[abstract:darwin:filesystems:lustre|high-performance Lustre-based filesystem]] having almost 1.1 PB of usable space. Users should have a basic understanding of the concepts of [[abstract:darwin:filesystems:lustre|Lustre]] to take full advantage of this filesystem. The default stripe count is set to 1 and the default striping is a single stripe distributed across all available OSTs on Lustre. See [[https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html|Lustre Best Practices]] from Nasa.
  
-Each allocation will have at least 1 TiB of shared ([[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup]]) storage in the ''/lustre/'' directory identified by the <<//allocation_workgroup//>> (e.g., ''/lustre/it_css''for everyone in the allocation workgroup, and is referred to as your workgroup directory (''$WORKDIR''), if the allocation workgroup has been set. +Each allocation will have at least 1 TiB of shared ([[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup]]) storage in the ''/lustre/'' directory identified by the <<//allocation_workgroup//>> (e.g., ''/lustre/it_css''accessbile by all users in the allocation workgroup, and is referred to as your workgroup directory (''$WORKDIR''), if the allocation workgroup has been set. 
  
 Each user in the allocation workgroup will have a ''/lustre/<<//workgroup//>>/users/<<//uid//>>'' directory to be used as a personal workgroup storage directory for running jobs, storing larger amounts of data, input files, supporting data files, work files, output files and source code. It can be referred to as ''$WORKDIR_USERS'', if the allocation workgroup has been set.  Each user in the allocation workgroup will have a ''/lustre/<<//workgroup//>>/users/<<//uid//>>'' directory to be used as a personal workgroup storage directory for running jobs, storing larger amounts of data, input files, supporting data files, work files, output files and source code. It can be referred to as ''$WORKDIR_USERS'', if the allocation workgroup has been set. 
Line 21: Line 21:
 Each allocation will also have a ''/lustre/<<//workgroup//>>/sw'' directory to allow users to install software to be shared for the allocation workgroup. It can be referred to as ''$WORKDIR_SW'', if the allocation workgroup has been set. In addition a ''/lustre/<<//workgroup//>>/sw/valet'') directory is also provided to store VALET package files to shared for the allocation workgroup.  Each allocation will also have a ''/lustre/<<//workgroup//>>/sw'' directory to allow users to install software to be shared for the allocation workgroup. It can be referred to as ''$WORKDIR_SW'', if the allocation workgroup has been set. In addition a ''/lustre/<<//workgroup//>>/sw/valet'') directory is also provided to store VALET package files to shared for the allocation workgroup. 
  
-Please see [[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|allocation workgroup]] for complete details on environment variables.+Please see [[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup]] for complete details on environment variables.
  
 **Note**: A full filesystem inhibits use for everyone preventing jobs from running. **Note**: A full filesystem inhibits use for everyone preventing jobs from running.
 ===== Local filesystem ===== ===== Local filesystem =====
  
-==== Node scratch ====+==== Node temporary storage ====
  
 Each compute node has its own 2 TB local hard drive, which is needed for time-critical tasks such as managing virtual memory.  The system usage of the local disk is kept as small as possible to allow some local disk for your applications, running on the node.  Each compute node has its own 2 TB local hard drive, which is needed for time-critical tasks such as managing virtual memory.  The system usage of the local disk is kept as small as possible to allow some local disk for your applications, running on the node. 
Line 61: Line 61:
 </code> </code>
  
-==== Workgroup Lustre ====+==== Workgroup ====
  
-All of Lustre is available as allocation workgroup storage.  To check Lustre usage for all users, use ''df -h /lustre''.+All of Lustre is available for allocation workgroup storage.  To check Lustre usage for all users, use ''df -h /lustre''.
  
 The example below shows 25 TB is in use out of 954 TB of usable Lustre storage. The example below shows 25 TB is in use out of 954 TB of usable Lustre storage.
Line 84: Line 84:
  
 </code> </code>
-==== Node scratch ==== +==== Node ==== 
-The node scratch is mounted on ''/tmp'' for all nodes.  There is no quota, and if you exceed the physical size of the disk you will get disk failure messages.  To check the usage of your disk, use the ''df -h'' command **on the compute node** where your job is running. +The node temporary storage is mounted on ''/tmp'' for all nodes.  There is no quota, and if you exceed the physical size of the disk you will get disk failure messages.  To check the usage of your disk, use the ''df -h'' command **on the compute node** where your job is running. 
  
 We strongly recommend that you refer to the node scratch by using the environment variable, ''$TMPDIR'', which is defined by Slurm when using ''salloc'' or ''srun''or ''sbatch''. We strongly recommend that you refer to the node scratch by using the environment variable, ''$TMPDIR'', which is defined by Slurm when using ''salloc'' or ''srun''or ''sbatch''.
Line 109: Line 109:
 ==== Home and Workgroup ==== ==== Home and Workgroup ====
  
-<note important>While all filesystems on the DARWIN cluster utilize hardware redundancies to protect data, there is **no** backup or replication and **no** recovery available for the home or Lustre filesystems.+<note important>While all filesystems on the DARWIN cluster utilize hardware redundancies to protect data, there is **no** backup or replication and **no** recovery available for the home or Lustre filesystems. All backups are the responsibility of the user. DARWIN's systems administrators are not liable for any lost data
 </note> </note>
  
  • abstract/darwin/filesystems/filesystems.1619624913.txt.gz
  • Last modified: 2021-04-28 11:48
  • by anita