abstract:darwin:filesystems:filesystems

DARWIN Filesystems

The 13.5 TiB filesystem uses 960 GiB enterprise class SSD drives in a triple-parity RAID configuration for high reliability and availability. The filesystem is accessible to all nodes via IPoIB on the 100 Gbit/s InfiniBand network.

Each user has 20 GB of disk storage reserved for personal use on the home filesystem. Users' home directories are in /home (e.g., /home/1005), and the directory name is put in the environment variable $HOME at login.

Lustre is designed to use parallel I/O techniques to reduce file-access time. The Lustre filesystems in use at UD are composed of many physical disks using RAID technologies to give resilience, data integrity, and parallelism at multiple levels. There is approximately 1.1 PiB of Lustre storage available on DARWIN. It uses high-bandwidth interconnects such as Mellanox HDR100. Lustre should be used for storing input files, supporting data files, work files, and output files associated with computational tasks run on the cluster.

Allocation workgroup storage is available on a high-performance Lustre-based filesystem having almost 1.1 PB of usable space. Users should have a basic understanding of the concepts of Lustre to take full advantage of this filesystem. The default stripe count is set to 1 and the default striping is a single stripe distributed across all available OSTs on Lustre. See Lustre Best Practices from Nasa.

Each allocation will have at least 1 TiB of shared (workgroup) storage in the /lustre/ directory identified by the «allocation_workgroup» (e.g., /lustre/it_css) accessbile by all users in the allocation workgroup, and is referred to as your workgroup directory ($WORKDIR), if the allocation workgroup has been set.

Each user in the allocation workgroup will have a /lustre/«workgroup»/users/«uid» directory to be used as a personal workgroup storage directory for running jobs, storing larger amounts of data, input files, supporting data files, work files, output files and source code. It can be referred to as $WORKDIR_USERS, if the allocation workgroup has been set.

Each allocation will also have a /lustre/«workgroup»/sw directory to allow users to install software to be shared for the allocation workgroup. It can be referred to as $WORKDIR_SW, if the allocation workgroup has been set. In addition a /lustre/«workgroup»/sw/valet) directory is also provided to store VALET package files to shared for the allocation workgroup.

Please see workgroup for complete details on environment variables.

Note: A full filesystem inhibits use for everyone preventing jobs from running.

Each compute node has its own 2 TB local hard drive, which is needed for time-critical tasks such as managing virtual memory. The system usage of the local disk is kept as small as possible to allow some local disk for your applications, running on the node.

To help users maintain awareness of quotas and their usage on the /home filesystem, the command my_quotas is available to display a list of all quota-controlled filesystems on which the user has storage space.

For example, the following shows the amount of storage available and in-use for user traine in workgroup it_css for their home and workgroup directory.

$ my_quotas
Type  Path           In-use / kiB Available / kiB  Pct
----- -------------- ------------ --------------- ----
user  /home/1201          7497728        20971520  36%
group /lustre/it_css          228      1073741824   0%

Each user's home directory has a hard quota limit of 20 GB. To check usage, use

    df -h $HOME

The example below displays the usage for the home directory (/home/1201) for the account traine as 7.2 GB used out of 20 GB which matches the above example provide by my_quotas command.

$ df -h $HOME
Filesystem                 Size  Used Avail Use% Mounted on
nfs0-ib:/beagle/home/1201   20G  7.2G   13G  36% /home/1201

All of Lustre is available for allocation workgroup storage. To check Lustre usage for all users, use df -h /lustre.

The example below shows 25 TB is in use out of 954 TB of usable Lustre storage.

$ df -h /lustre
Filesystem                             Size  Used Avail Use% Mounted on
10.65.2.6@o2ib:10.65.2.7@o2ib:/darwin  978T   25T  954T   3% /lustre

To see your allocation workgroup usage, please use the my_quotas command. Again the the following example shows the amount of storage available and in-use for user traine in allocation workgroup it_css for their home and allocation workgroup directories.

$ my_quotas
Type  Path           In-use / kiB Available / kiB  Pct
----- -------------- ------------ --------------- ----
user  /home/1201          7497728        20971520  36%
group /lustre/it_css          228      1073741824   0%

The node temporary storage is mounted on /tmp for all nodes. There is no quota, and if you exceed the physical size of the disk you will get disk failure messages. To check the usage of your disk, use the df -h command on the compute node where your job is running.

We strongly recommend that you refer to the node scratch by using the environment variable, $TMPDIR, which is defined by Slurm when using salloc or srunor sbatch.

For example, the command

   ssh r1n00 df -h /tmp

shows size, used and available space in M, G or T units.

Filesystem      Size  Used Avail Use% Mounted on
/dev/sda3       1.8T   41M  1.8T   1% /tmp

This node r1n00 has a 2 TB disk, with only 41 MB used, so 1.8 TB is available for your job.

There is a physical disk installed on each node that is used for time critical tasks, such as swapping memory. Most of the compute nodes are configured with a 2 TB disk, however, the /tmp filesystem will never have the total disk. Larger memory nodes will need to use more of the disk for swap space.
While all filesystems on the DARWIN cluster utilize hardware redundancies to protect data, there is no backup or replication and no recovery available for the home or Lustre filesystems. All backups are the responsibility of the user. DARWIN's systems administrators are not liable for any lost data.

Home directory: Use your home directory to store private files. Application software you use will often store its configuration, history and cache files in your home directory. Generally, keep this directory free and use it for files needed to configure your environment. For example, add symbolic links in your home directory to point to files in any of the other directory.

Workgroup directory: Use the personal allocation workgroup directory for running jobs, storing larger amounts of data, input files, supporting data files, work files, output files and source code in $WORKDIR_USERS as a extension of your home direcory. See the Application Development for information on building applications and Installing Software. It is also appropriate to use the software allocation workgroup directory to build applications for everyone in your allocation group in $WORKDIR_SW as well as create a VALET package for your fellow researchers to access applications you want to share in $WORKDIR_SW/valet. See Workgroup Software Installs on DARWIN for details.

Node scratch directory: Use the node scratch directory for temporary files. The job scheduler software (Slurm) creates a temporary directory in /tmp specifically for each job's temporary files. This is done on each node assigned to the job. When the job is complete, the subdirectory and its contents are deleted. This process automatically frees up the local scratch storage that others may need. Files in node scratch directories are not available to the head node, or other compute nodes.

  • abstract/darwin/filesystems/filesystems.txt
  • Last modified: 2021-10-14 11:19
  • by anita