====== DARWIN Filesystems ====== ===== Home filesystem ===== The 13.5 TiB filesystem uses 960 GiB enterprise class SSD drives in a triple-parity RAID configuration for high reliability and availability. The filesystem is accessible to all nodes via IPoIB on the 100 Gbit/s InfiniBand network. ==== Home storage ==== Each user has 20 GB of disk storage reserved for personal use on the home filesystem. Users' home directories are in /home (e.g., ''/home/1005''), and the directory name is put in the environment variable ''$HOME'' at login. ===== High-performance Lustre filesystem ===== Lustre is designed to use parallel I/O techniques to reduce file-access time. The Lustre filesystems in use at UD are composed of many physical disks using RAID technologies to give resilience, data integrity, and parallelism at multiple levels. There is approximately 1.1 PiB of Lustre storage available on DARWIN. It uses high-bandwidth interconnects such as Mellanox HDR100. Lustre should be used for storing input files, supporting data files, work files, and output files associated with computational tasks run on the cluster. ==== Workgroup storage ==== Allocation workgroup storage is available on a [[abstract:darwin:filesystems:lustre|high-performance Lustre-based filesystem]] having almost 1.1 PB of usable space. Users should have a basic understanding of the concepts of [[abstract:darwin:filesystems:lustre|Lustre]] to take full advantage of this filesystem. The default stripe count is set to 1 and the default striping is a single stripe distributed across all available OSTs on Lustre. See [[https://www.nas.nasa.gov/hecc/support/kb/lustre-best-practices_226.html|Lustre Best Practices]] from Nasa. Each allocation will have at least 1 TiB of shared ([[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup]]) storage in the ''/lustre/'' directory identified by the <> (e.g., ''/lustre/it_css'') accessbile by all users in the allocation workgroup, and is referred to as your workgroup directory (''$WORKDIR''), if the allocation workgroup has been set. Each user in the allocation workgroup will have a ''/lustre/<>/users/<>'' directory to be used as a personal workgroup storage directory for running jobs, storing larger amounts of data, input files, supporting data files, work files, output files and source code. It can be referred to as ''$WORKDIR_USERS'', if the allocation workgroup has been set. Each allocation will also have a ''/lustre/<>/sw'' directory to allow users to install software to be shared for the allocation workgroup. It can be referred to as ''$WORKDIR_SW'', if the allocation workgroup has been set. In addition a ''/lustre/<>/sw/valet'') directory is also provided to store VALET package files to shared for the allocation workgroup. Please see [[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup]] for complete details on environment variables. **Note**: A full filesystem inhibits use for everyone preventing jobs from running. ===== Local filesystem ===== ==== Node temporary storage ==== Each compute node has its own 2 TB local hard drive, which is needed for time-critical tasks such as managing virtual memory. The system usage of the local disk is kept as small as possible to allow some local disk for your applications, running on the node. ===== Quotas and usage ===== To help users maintain awareness of quotas and their usage on the ''/home'' filesystem, the command ''my_quotas'' is available to display a list of all quota-controlled filesystems on which the user has storage space. For example, the following shows the amount of storage available and in-use for user ''traine'' in workgroup ''it_css'' for their home and workgroup directory. $ my_quotas Type Path In-use / kiB Available / kiB Pct ----- -------------- ------------ --------------- ---- user /home/1201 7497728 20971520 36% group /lustre/it_css 228 1073741824 0% ==== Home ==== Each user's home directory has a hard quota limit of 20 GB. To check usage, use df -h $HOME The example below displays the usage for the home directory (''/home/1201'') for the account ''traine'' as 7.2 GB used out of 20 GB which matches the above example provide by ''my_quotas'' command. $ df -h $HOME Filesystem Size Used Avail Use% Mounted on nfs0-ib:/beagle/home/1201 20G 7.2G 13G 36% /home/1201 ==== Workgroup ==== All of Lustre is available for allocation workgroup storage. To check Lustre usage for all users, use ''df -h /lustre''. The example below shows 25 TB is in use out of 954 TB of usable Lustre storage. $ df -h /lustre Filesystem Size Used Avail Use% Mounted on 10.65.2.6@o2ib:10.65.2.7@o2ib:/darwin 978T 25T 954T 3% /lustre To see your allocation workgroup usage, please use the ''my_quotas'' command. Again the the following example shows the amount of storage available and in-use for user ''traine'' in allocation workgroup ''it_css'' for their home and allocation workgroup directories. $ my_quotas Type Path In-use / kiB Available / kiB Pct ----- -------------- ------------ --------------- ---- user /home/1201 7497728 20971520 36% group /lustre/it_css 228 1073741824 0% ==== Node ==== The node temporary storage is mounted on ''/tmp'' for all nodes. There is no quota, and if you exceed the physical size of the disk you will get disk failure messages. To check the usage of your disk, use the ''df -h'' command **on the compute node** where your job is running. We strongly recommend that you refer to the node scratch by using the environment variable, ''$TMPDIR'', which is defined by Slurm when using ''salloc'' or ''srun''or ''sbatch''. For example, the command ssh r1n00 df -h /tmp shows size, used and available space in M, G or T units. Filesystem Size Used Avail Use% Mounted on /dev/sda3 1.8T 41M 1.8T 1% /tmp This node ''r1n00'' has a 2 TB disk, with only 41 MB used, so 1.8 TB is available for your job. There is a physical disk installed on each node that is used for time critical tasks, such as swapping memory. Most of the compute nodes are configured with a 2 TB disk, however, the ''/tmp'' filesystem will never have the total disk. Larger memory nodes will need to use more of the disk for swap space. ===== Recovering files ===== ==== Home and Workgroup ==== While all filesystems on the DARWIN cluster utilize hardware redundancies to protect data, there is **no** backup or replication and **no** recovery available for the home or Lustre filesystems. All backups are the responsibility of the user. DARWIN's systems administrators are not liable for any lost data. ===== Usage Recommendations ===== **Home directory**: Use your [[#home|home]] directory to store private files. Application software you use will often store its configuration, history and cache files in your home directory. Generally, keep this directory free and use it for files needed to configure your environment. For example, add [[http://en.wikipedia.org/wiki/Symbolic_link#POSIX_and_Unix-like_operating_systems|symbolic links]] in your home directory to point to files in any of the other directory. **Workgroup directory**: Use the personal allocation [[#workgroup|workgroup]] directory for running jobs, storing larger amounts of data, input files, supporting data files, work files, output files and source code in ''$WORKDIR_USERS'' as a extension of your home direcory. See the [[abstract:darwin:app_dev:app_dev|Application Development]] for information on building applications and [[abstract:darwin:install_software:install_software:|Installing Software]]. It is also appropriate to use the software allocation [[#workgroup|workgroup]] directory to build applications for everyone in your allocation group in ''$WORKDIR_SW'' as well as create a VALET package for your fellow researchers to access applications you want to share in ''$WORKDIR_SW/valet''. See [[abstract:darwin:install_software:workgroup-sw|Workgroup Software Installs on DARWIN]] for details. **Node scratch directory**: Use the [[#node-scratch|node scratch]] directory for temporary files. The job scheduler software (Slurm) creates a temporary directory in /tmp specifically for each job's temporary files. This is done on each node assigned to the job. When the job is complete, the subdirectory and its contents are deleted. This process automatically frees up the local scratch storage that others may need. Files in node scratch directories are not available to the head node, or other compute nodes.