Farber Storage
Permanent filesystem
The 65 TB permanent filesystem uses 3 TB enterprise class SATA drives in a triple-parity RAID configuration for high reliability and availability. The filesystem is accessible to the head node via 10 Gbit/s Ethernet and to the compute nodes via 1 Gbit/s Ethernet.
Home storage
Each user has 20 GB of disk storage reserved for personal use on the home file system. Users' home directories are in /home (e.g., /home/1005
), and the directory name is put in the environment variable $HOME
at login.
The permanent file system is configured to allow nearly instantaneous, consistent snapshots. The snapshot contains the original version of the file system, and the live file system contains any changes made since the snapshot was taken. In addition,
all your files are regularly replicated at UD's off-campus disaster recovery site. You can use read-only snapshots to revert a previous version, or request to have your files restored from the disaster recovery site.
You can check to see the size and usage of your home directory with the command
df -h $HOME
Workgroup storage
Each research group has at least 1000 GB of shared group (workgroup) storage in the /home/work
directory identified by the «investing_entity» (e.g., /home/work/it_css
) and is referred to as your workgroup directory. This is used for input files, supporting data files, work files, and output files, source code and executables that need to be shared with your research group.
Just as your home directory, read-only snapshots of workgroup's files are made several times for the passed week. In addition, the filesystem is replicated on UD's off-campus disaster recovery site. Snapshots are user-accessible, and older files may be retrieved by special request.
You can check the size and usage of your workgroup directory by using the workgroup
command to spawn a new workgroup shell, which sets the environment variable $WORKDIR
df -h $WORKDIR
/home/work
directory you will only see the directories most recently mounted. If you do not see your directory, you can auto-mount by using it in any command, such as the df
.
High-performance filesystem
Lustre storage
User storage is available on a high-performance Lustre-based filesystem having 257 TB of usable space. This is used for temporary input files, supporting data files, work files, and output files associated with computational tasks run on the cluster. The filesystem is accessible to all of the processor cores via 56 Gbps (FDR) Infiniband. The default stripe count is set to 1 and the default striping is a single stripe distributed across all available OSTs on Lustre. See Lustre Best Practices from Nasa.
$HOME
) or Workgroup ($WORKDIR
) storage. No executables are permitted to run from the Lustre filesystem.
The Lustre filesystem is not backed up. However, it is a robust RAID-6 system. Thus, the filesystem can survive a concurrent disk failure of two independent hard drives and still rebuild its contents automatically.
The /lustre filesystem is partitioned as shown below:
Directory | Description |
---|---|
work | Private work directories for individual investor-groups |
scratch | Public scratch space for all users |
All users will use the public scratch directory (lustre/scratch
). IT staff will run regular cleanup procedures
to purge aged files or directories in /lustre/scratch
to avoid degrading system performance.
An investing-entity may purchase private Lustre storage that is mounted in /lustre/work
and not be subject to the same regular cleanup procedures as /lustre/scratch
. Each
investing-entity's principal stakeholder is responsible for maintenance of the group's private Lustre directory.
The default group-ownership for a file created in a private work directory is the group name. Its default file
permissions are 644.
/lustre/scratch
, and finish by copying results back to your private /home
or shared /home/work
directory. Please clean up (delete) all of the
remaining files in /lustre/scratch
no longer needed by using the custom Lustre utilities. If you do not clean up properly, then files will be purged from /lustre/scratch
by the regular cleanup procedures.
Note: A full filesystem inhibits use for everyone.
Local filesystem
Node scratch
Each compute node has its own 500GB local hard drive, which is needed for time-critical tasks such as managing virtual memory. The system usage of the local disk is kept as small as possible to allow some local disk for your applications, running on the node.
Filesystem mount points
A mount point is the term that describes where the computer puts the files in a hierarchical file system on the cluster. All the filesystems appear as one unified filesystem structure. Here is a list of mount points on Farber:
Mount point | Backed up | Description |
---|---|---|
/home | yes | Permanent storage, available to all nodes. The location for each user's home directory. A per-user quota system constrains each user. |
/lustre | no | High-performance storage available to all nodes. May be used for group and for individual use. Appropriate space allocation is controlled by agreed-upon published policies. |
/scratch | no | A local storage disk only accessible by a single node. Typically used by applications running on that node and having extensive I/O. It is your responsibility to remove files left on that filesystem at the end of your job if a job scheduler hasn't removed them automatically. |
Quotas and usage
To help users maintain awareness of quotas and their usage on the /home
filesystem, the command my_quotas
is now available to display a list of the quota-controlled filesystems on which the user has storage space.
For example,
$ my_quotas Type Path In-use / kiB Available / kiB Pct ----- -------------------------- ------------ ------------ ---- user /home/1201 1691648 20971520 8% group /home/work/it_css 39649280 1048576000 4%
Home
Each user's home directory has a hard quota limit of 20 GB. To check usage, use
df -h $HOME
The example below displays the usage for the home directory (/home/1201
) for the account traine
as 3.0 MB out of 20 GB.
Filesystem Size Used Avail Use% Mounted on storage-nfs1:/export/home/1201 20G 3.0M 20G 1% /home/1201
Workgroup
Each group's work directory has a quota designed to give your group 1 TB of disk space or more depending on the number of nodes in your workgroup. Use the workgroup -g
command to define the $WORKDIR
environment variable, then use the df -h
command to check usage.
df -h $WORKDIR
The example below shows 0 GB used from the 1000 GB total size for the it_css
workgroup.
[traine@farber ~]$ workgroup -g it_css [(it_css:traine)@farber ~]$ df -h $WORKDIR Filesystem Size Used Avail Use% Mounted on storage-nfs1:/export/work/it_css 1000G 0 1000G 0% /home/work/it_css
Lustre
All of Lustre is considered scratch storage and subject to removal if necessary for Lustre-performance reasons. All users can create their own directories under the /lustre/scratch
directory and manage them using custom Lustre utilities. To check Lustre usage, use df -h /lustre
.
The example below is based on user traine
in workgroup it_css
showing 5.9 GB used from a total filesystem size of 257 TB available on Lustre.
[(it_css:traine)@farber ~]$ df -h /lustre Filesystem Size Used Avail Use% Mounted on ddn-mds2-ib@o2ib1:ddn-mds1-ib@o2ib1:/farber 257T 5.9G 244T 1% /lustre
df -h /lustre
command shows the use of Lustre for all users.
Node scratch
The node scratch is mounted on /scratch
for each of your nodes. There is no quota, and if you exceed the physical size of the disk you will get disk failure messages. To check the usage of your disk, use the df -h
command on the compute node.
For example, the command
ssh n036 df -h /scratch
shows size, used and available space in M, G or T units.
Filesystem Size Used Avail Use% Mounted on /dev/sda2 457G 198M 434G 1% /tmp
This node n036
has a 500 GB disk, with 457 GB available for your applications.
/scratch
filesystem will never have the total disk. Large memory nodes will use more of the disk for swap space.
We strongly recommend that you refer to the node scratch by using the environment variable, $TMPDIR
, which is defined by Grid Engine when using qsub
or qlogin
.
Recovering files
Home and Workgroup snapshots
Snapshots are read-only images of the filesystem at the time the snapshot is taken. They are available under the .zfs/snapshot
directory from the base of the filesystem (e.g., $WORKDIR/.zfs/snapshot/
or $HOME/.zfs/snapshot/
). The .zfs
directory does not show up in a directory listing using ls -a
as it is hidden, but you can "go to" the directory with the cd command. In there you will find directories with the name of yyyymmdd-HHMM
, where the yyyy
is the 4 digit year, mm
is the 2 digit month, dd
is a 2 digit day, HH
is the hour (in 24-hour format) of the day, and MM
is the minute inside that hour when the snapshot was taken. They are named like this to allow any number of snapshots and easilly identify when the snapshot was taken. Multiple snapshots are kept per day for 3 days, then daily snapshots going back a month, after this there are weekly, and finally monthly retention policies in place. This allows for retrieving file “backups” from the system well into the past.
When an initial snapshot is taken, no space is used as it is a read-only reference for the current filesystem image. However, as the filesystem changes, copy-on-write of data blocks is done and will cause snapshots to use space. These new blocks used by snapshots do not count against the 1TB limit that the group's filesystem can reference, but they do count toward a 4TB limit per research group (workgroup). As directories begin to reach these limits, the number of snapshots will automatically be reduced to keep the workgroup and home directories from filling up.
Some example uses of snapshots for users are:
- If a file is deleted or modified during the afternoon on November 26th, you can go to the
20141126-1215
snapshot and retrieve the file as it existed at that time. - If a file was deleted on November 26th and you do not realize until Monday you can use the
20141125-2215
snapshot to retrieve the file.
Example recovering .ssh directory from snapshot
By default, your .ssh
directory is set up for you as part of your account on the clusters to have the proper SSH keys to allow you to use qlogin to connect to compute nodes. Sometime clients report they are no longer able to use qlogin to connect to a compute node because the SSH keys have been changed, usually by accident. No worries, you can restore your .ssh
directory from a snapshot when you know it was last working. For example, say you could use qlogin on December 1st, 2017, but you realized on December 4th, 2017 that it stopped working. The example below shows how to go to the snapshot in your home directory, find the corresponding snapshot directory for December 1st, 2017 which is 20171201-1315
for the afternoon (1:15pm) snapshot on that day and then copy the files from this snapshot to replace the ones no longer working. Just remember if you did make other changes to these files after December 1st, then you will lose those changes.
$ cd ~/.zfs/snapshot $ ls -l : $ cd 20171201-1315/.ssh $ ls -l total 11 -rw-r--r-- 1 traine everyone 221 Jan 27 2017 authorized_keys -rw------- 1 traine everyone 1679 Sep 3 2014 id_rsa -rw-r--r-- 1 traine everyone 406 Sep 3 2014 id_rsa.pub -rw-r--r-- 1 traine it_css 1221 Oct 12 2016 known_hosts $ cp -a * ~/.ssh cp: overwrite `/home/1201/.ssh/authorized_keys'? y cp: overwrite `/home/1201/.ssh/id_rsa'? y cp: overwrite `/home/1201/.ssh/id_rsa.pub'? y cp: overwrite `/home/1201/.ssh/known_hosts'? y $
Usage Recommendations
Home directory: Use your home directory to store private files. Application software you use will often store its configuration, history and
cache files in your home directory. Generally, keep this directory free and use it for files needed to configure your environment. For example, add symbolic links in your home directory to point to files in any of the other directory. The /home/
filesystem is backed-up with snapshots.
Workgroup directory: Use the workgroup directory (/home/work/«investing_entity») to build applications for you or your group to use as well as important data, modified source or any other files need to be shared by your research group. See the Application development section for information on building applications. You should create a VALET package for your fellow researchers to access applications you want to share. A typical workflow is to copy the files needed from /home/work
to /lustre/scratch
for the actual run. The /home/work
system is backed-up with snapshots.
Public scratch directory: Use the public Lustre scratch directory (/lustre/scratch) for files where high performance is required. Store files produced as intermediate work files, and remove them when your current project is done. That will free up the public scratch workspace others also need. This is also a good place for sharing files and and data with all users. Files in this directory are not backed up, and subject to removal. Use Lustre utilities from a compute node to check disk usage and remove files no longer needed.
Node scratch directory: Use the node scratch directory (/scratch) for temporary files. The job scheduler software (Grid Engine) creates a subdirectory in /scratch specifically for each job's temporary files. This is done on each node assigned to the job. When the job is complete, the subdirectory and its contents are deleted. This process automatically frees up the local scratch storage that others may need. Files in node scratch directories are not available to the head node, or other compute nodes.
Lustre workgroup directory: Use the Lustre work directory (/lustre/work/«investing_entity») if purchased by your research group for files where high performance is required. Keep just the files needed for your job such as scripts and large data sets used for input or created for output. Remember the disk is not backed up and subject to removal only if needed, so be prepared to rebuild the files if necessary. With batch jobs, the queue script is a record of what you did, but for interactive work, you need to take notes as a record of your work. Use Lustre utilities from a compute node to check disk usage and remove files no longer needed.