Differences

This shows you the differences between two versions of the page.

--- abstract:mills:filesystems [2017-10-24 09:57] – created sraskar
+++ abstract:mills:filesystems [2017-12-07 18:26] (current) – removed anita
@@ Line 1: / Line 1: @@
-====== Using Mills' filesystems ======
-===== Permanent filesystems =====
-==== Home ====
-Each user has 2 GB of disk space reserved for personal use on the home file system. Users' home directories are in /home  (e.g., ''/home/1005''). These are regularly archived at UD's off-campus disaster recovery site.
-Use the [[/clusters/mills/filesystems#home-backups|recover]] command to restore older versions or deleted files in your home directory.
-An  8 TB permanent filesystem is provided on the login node (head node), mills.hpc.udel.edu. The filesystem's RAID-6 (double parity) configuration is accessible to the compute nodes via 1 Gb/s Ethernet and to the campus network via 10 Gb/s Ethernet. Two terabytes are allocated for users' home directories in /home.  The remaining 6 TB are reserved for the system software, libraries and applications in /opt.
-==== Archive ====
-Each research group has 1 TB of shared group storage on the archive filesystem (/archive). The directory is identified by the research-group identifier <<//investing_entity//>> (e.g., ''/archive/it_css''). A read-only snapshot of users' files is made several times per day on the disk. In addition, the filesystem is replicated on UD's off-campus disaster recovery site. Daily [[/clusters/mills/filesystems?&#archive-snapshots|snapshots]] are user-accessible. Older files may be restored by special request.
-The 60 TB permanent archive filesystem uses 3 TB enterprise class SATA drives in a triple-parity RAID configuration for high reliability and availability. The filesystem is accessible to the head node via 10 Gbit/s Ethernet and to the compute nodes via 1 Gbit/s Ethernet. ([[https://docs.nss.udel.edu/wiki/hpc/doku.php/clusters/mills/archive-filesystem |Details]])
-===== High-performance filesystem =====
-==== Lustre ====
-User storage is available on a high-performance Lustre-based filesystem having 172 TB of usable
-space. This is used for input files, supporting data files, work files, and output files, source
-code and executables associated with computational tasks run on the cluster. The filesystem is
-accessible to all of the processor cores via QDR InfiniBand.
-The Lustre filesystem is not backed up. However, it is a robust RAID-6 system. Thus, the filesystem
-can survive a concurrent disk failure of two independent hard drives and still rebuild its contents
-automatically.
-The /lustre filesystem is partitioned as shown below:
-^ Directory ^ Description ^
-| work | Private work directories for individual investor-groups |
-| scratch | Public scratch space for all users |
-| sysadmin | System administration use |
-Each investing-entity has a private work directory (/lustre/work/<<//investing_entity//>>) that is
-group-writable. This is where you should create and store most of your files. Each
-investing-entity's principal stakeholder is responsible for maintenance of the group's directory. IT
-does **//not//** automatically delete files from these directories. The default group-ownership for
-a file created in a private work directory is the investing-entity's group name. Its default file
-permissions are 644.
-Anyone may use the public scratch directory (/lustre/scratch). IT staff may run cleanup procedures
-as needed to purge aged files or directories in /lustre/scratch if old files are degrading system
-performance. **////**
-**Note**: A full filesystem inhibits use for everyone.
-===== Local filesystem =====
-==== Node scratch ====
-Each compute node has its own 1-2 TB local hard drive, which is needed for time-critical tasks such as managing virtual memory.  The system usage of the local disk is kept as small as possible to allow some local disk for your applications, running on the node.  Thus, there is a ''/scratch'' filesystem mounted on each node.
-===== Quotas and usage =====
-To help users maintain awareness of quotas and their usage on ''/lustre/work'', ''/home'' and ''/archive'' filesystems, the command ''my_quotas'' is now available to display a list of the quota-controlled filesystems (Lustre, NFS, XFS) on which the user has storage space.
-For example,
-<code>
-$ my_quotas
-Type  Path                         In-use / kiB   Available / kiB  Pct
------ --------------------------- ------------- ----------------- ----
-user  /home/1001                     1713689184 72057594037927936   0%
-group /archive/it_nss                 167143424        1073741824  16%
-group /lustre/work/it_nss               1161212        1219541792   0%
-</code>
-==== Home ====
-Each user's home directory has a hard quota limit of 2 GB.
-==== Archive ====
-Each group's work directory has a quota designed to give your group 1 TB of disk space.
-==== Lustre ====
-Each investing-entity originally had an informal quota for its private work directory in ''/lustre/work/'' based on 1 TB plus an extra 10 GB/processor-core owned by the investing-entity. Most groups therefore have approximately 1.25 TB quota. With the separation of ''/lustre/scratch'', IT has enabled quotas on ''/lustre/work'' for each research group based on their current usage, plus some additional overhead proportional to the purchased core count on Mills.  IT will continue to run cleanup procedures as needed to purge aged files or directories in ''/lustre/scratch'' since there are no quotas.  With quotas in place on ''/lustre/work'', it will prevent any research group from filling up ''/lustre/work'' and impacting other research groups.
-To determine usage for user ''traine'' in workgroup ''it_css'', use the command
-<code>
-[traine@mills ~]$ my_quotas
-Type  Path                In-use / kiB Available / kiB  Pct
------ ------------------- ----------- ------------ ----
-user  /home/1201               314016     10485760   3%
-group /archive/it_css         7496704   1073741824   1%
-group /lustre/work/it_css   188761744    914656344  21%
-</code>
-To determine all usage on ''/lustre/scratch'', use the command
-<code>
-[traine@mills ~]$ df -H /lustre/scratch
-Filesystem                          Size  Used Avail Use% Mounted on
-mds1-ib@o2ib:mds2-ib@o2ib:/scratch  160T  2.8T  150T   2% /lustre-scratch
-[traine@mills ~]$
-</code>
-<note important>
-Files are automatically cleaned up for ''/lustre/scratch'' to prevent it from reaching ''100%''.
-</note>
-<note warning>Please use the custom [[:general:filesystems:Lustre-utilities|Lustre utilities]] to remove files on all Lustre filesytems ''/lustre/work'' or ''/lustre/scratch'', or to check disk usage on ''/lustre/scratch''.</note>
-==== Node scratch ====
-The node scratch is mounted on ''/scratch'' for each of your nodes.  There is no quota, and if you exceed the physical size of the disk you will get disk failure messages.  To check the usage of your disk use the ''df -h'' command **on the compute node**.
-For example, the command
-<code>
-   ssh n017 df -h /scratch
-</code>
-shows 197 MB used from the total filesystem size of 793 GB.
-<code>
-Filesystem            Size  Used Avail Use% Mounted on
-/dev/sda2             793G  197M  753G   1% /scratch
-</code>
-This node ''n017'' has a 1 TB disk and 64 MB memory, which requires 126 GB of swap space on the disk.
-<note warning>There is a physical disk installed on each node that is used for time critical tasks, such as swapping memory. The compute nodes are configured with either 1 TB disk or 2 TB disk, however, the ''/scratch'' filesystem will never have the total disk.  Large memory nodes need more swap space.
-</note>
-We strongly recommend that you refer to the node scratch by using the environment variable, ''$TMPDIR'', which is defined by Grid Engine when using ''qsub'' or ''qlogin''.
-===== Recovering files =====
-==== Home backups ====
-Files in your home directory and all sub-directories are backed up using the campus backup system.  The
-**recover** command is for browsing the index of saved files and recovering selected files from the backup system.  To recover a file, <<filename>>, to its original location
-  * Go to the original directory using **cd** command.
-  * Start an interactive recover session using the **recover** command.
-  * Type the recover command: ''add ''<<filename>>
-  * Schedule the file recovery with the command: ''recover''
-Here is a sample session where the file ''sourceme-gcc'' is removed and then recovered into its original location.
-<code>
-[traine@mills ex0]$ rm sourceme-gcc
-[traine@mills ex0]$ recover
-Current working directory is /home/1201/ex0/
-recover> add sourceme-gcc
-/home/1201/ex0
-file(s) marked for recovery
-recover> recover
-Recovering 1 file into its original location
-Volumes needed (all on-line):
-        d08.RO at /xanadu/xanadu_8/_AF_readonly
-Total estimated disk space needed for recover is 4 KB.
-Requesting 1 file(s), this may take a while...
-Requesting 1 recover session(s) from server.
-./sourceme-gcc
-Received 1 file(s) from NSR server `owell-3.nss.udel.edu'
-Recover completion time: Mon 20 Aug 2012 02:54:59 PM EDT
-recover> quit
-[dnairn@mills ex0]$ head -1 sourceme-gcc
-example='dgels'
-</code>
-==== Archive snapshots ====
-Snapshots are read-only images of the filesystem at the time the snapshot is taken. They are available under the ''.zfs/snapshot'' directory from the base of the filesystem (e.g., ''/archive/it_css/.zfs/snapshot/''). The ''.zfs'' directory does not show up in a directory listing using ''ls -a'' as it is hidden, but you can ""go to"" the directory with the **cd**  command. In there you will find directories with the name of ''12'', ''18'', ''Mon'', ''Tue'' … ''Sat'', ''Sun''. The ''12'' snapshot is taken during the noon hour, and the ''18'' snapshot is taken during the 6pm hour. They are named like this to allow future hourly snapshots to be possibly included. There are also snapshots for each day of the week which are taken during the 11pm hour. Each day during the noon hour the snapshot from the previous day is destroyed and a new one is taken. The same is done during the 6pm hour. During the 11pm hour the snapshot from one week ago is destroyed and a new one is created. This allows for retrieving file “backups” from the past week. You will also notice snapshots with the names ''now'', ''prev'' and ''prev-1''. These represent the snapshots used to replicate the filesystem to UD's off-campus disaster recovery site.
-When an initial snapshot is taken, no space is used as it is a read-only reference for the current filesystem image. However, as the filesystem changes, copy-on-write of data blocks is done and will cause snapshots to use space. These new blocks used by snapshots do not count against the 1TB limit that the group's filesystem can reference, but they do count toward a 4TB limit per research group (workgroup).
-Some example uses of snapshots for users are:
-  * If a file is deleted or modified during the afternoon you can go to the ''12'' snapshot taken during the noon hour and retrieve the file as it existed at that time.
-  * If a file was deleted on Friday and you do not realize until Monday you can use the ''Thu'' snapshot to retrieve the file.
-===== Recommended practices =====
-Generally, the /lustre filesystem provides better overall performance than the /home and /archive filesytems.
-This is especially true for input and output files needed or generated by jobs. The /lustre
-filesystem is accessible to all processor cores via (40 Gb/s) QDR InfiniBand. In comparison, the
-compute nodes access the /home and /archive filesystems over 1 Gb/s Ethernet.
-The /archive filesystem has less space available for your group, but has both regular snapshots and off-site duplication for recovering storage. The filesystem is especially useful for building applications for your group to use.  The main compilers and building tools are available on the head node, and the head node can access the /archive filesystem over 10 Gb/s Ethernet.  If your group is installing many packages you could exceed your storage limit.  Make sure you clean the install directory after you build and test your package.  Remove all files you can download again.
-The /home filesystem in limited in storage, and it is used by many applications to store user preference files and caches.  So even if you never put files in your home directory, you should regularly check your usage. ''quota -s''.
-**Private work directories**
-All members of an investing-entity share their group-writable, private lustre work directory,
-/lustre/work/<<//investing_entity//>>, and archive directory, /archive/<<//investing_entity//>>.  All users in your group
-have full access to add, move (rename) or remove directories and files in these group-writable directories. Be careful not to
-move or remove any files or directories you do not own (you own directories you created).
-Your fellow researchers will appreciate your good "cluster citizenship."
-You should create a personal subdirectory within any group-writable directory for your own group-related files. That will reduce the chance of others accidentally modifying or deleting your files.
-You will own this new personal directory with full access to you, and read-only access to your group.  Your
-fellow researchers can copy your files, but not modify them. Researchers not in your group can never see or copy your work, because the investing-entity work directory is only open to your group.
-<note important>
-This describes the way ownership and access is set using the default shell environment. You can make some changes by standard UNIX commands such as **chmod**, if you must.  However, you can't give users access to your files outside of your group. Use the public scratch directories for sharing files.
-</note>
-**Public scratch directories**
-All members of the cluster community share a world-writable, public lustre scratch directory,
-/lustre/scratch/.  Access to this world-writable directory is controlled by the //sticky bit// set on the directory.  When set,
-only the file owner or the directory's owner can rename (**mv**) or delete (**rm**) the files in the directory.
-You should create a personal directory that will be owned by you with full access to you, and read-only to every user.
-This is where you store files you are willing to share. This is also where you store files that require a large amount of disk space for a short period.  Make sure you clean up as soon as the disk space is no longer needed.
-**Work directory structure**
-Your group should initially consider how to organize the group's private work directory structure to
-match the group's workflow patterns. There is no single, best solution. For some, the work may be
-more project-oriented; for others, it may be more independent and user-oriented. Or the work may be
-some combination of these. One possibility for the lustre work might look like this:
-<code text>
-/lustre/work/it_css/
-    projects/
-       fuelcell/
-       turbulence/
-    users/
-       boltzman/
-       jjkim/
-       ksridhar/
-</code>
-where the ''projects'' and ''users'' directories are owned by the stakeholder, and will be group-writable with the stick bit set. In these directories it is safe for any user to create new projects and/or a personal user directory.  The user creating these new directories will own them and the control the name.  The stakeholder will be able to rename the subdirectories if restructuring is required.
-If your research group chooses to follow this structure, we suggest the following procedure: The
-stakeholder should first create the **projects** and **users** directories that is group-writable and has the sticky bit set (mode 1770).
-<code bash>
-  cd /lustre/work/it_css
-  mkdir -m 1770 projects users
-</code>
-==== Summary of usage recommendations ====
-[[#recommended-practices|Recommendations:]]
-**Private work directory**: Use the [[#lustre|Lustre]] work directory (/lustre/work/<<//investing_entity//>>) for files where high performance is required. Keep just the files needed for
-your job such as your applications, scripts and large data sets used for input or created for output. Remember the disk is not backed up, so be prepared to rebuild the files if necessary.  With batch jobs, the queue script is a
-record of what you did, but for interactive work, you need to take notes as a record of your work. Use [[#lustre-utilities|Lustre utilities]] from a compute node to check disk usage and remove files no longer needed.
-**Private archive directory**: Use the [[#archive|archive]] directory (/archive/<<//investing_entity//>>) to build applications for you or your group to use as well as important data, source or any other files you want backed-up.  This directory and all the make tools are available from the head node.  The head node has faster access to /archive than the compute nodes. See
-the [[:general/userguide/05_development|Application development]] section for information on building applications.  You should make a "setup" script that your fellow researchers can use to access applications you want to share. A typical workflow is to copy the files needed from /archive to /lustre for the actual run. The /archive system is backed-up with [[#archive-snapshots|snapshots]].
-**Public scratch directory**: Use the public [[#lustre|Lustre]] scratch directory (/lustre/scratch or /lustre-scratch) for files where high performance is required. Store files produced as intermediate work files, and remove them when your current project is done. That will free up the public scratch workspace others also need. This is also a good place for sharing files with all users on Mills.  Files in this directory are not backed up, and the first subject to removal if the filesystem gets full. Use [[#lustre-utilities|Lustre utilities]] from a compute node to check disk usage and remove files no longer needed.
-**Home directory**: Use your [[#home|home]] directory to store small private files. Application software you use will often store its configuration, history and
-cache files in your home directory. Generally, keep this directory free and only use it for files needed to configure your environment. For example, add [[http://en.wikipedia.org/wiki/Symbolic_link#POSIX_and_Unix-like_operating_systems|symbolic links]] in your home directory to point to files in any of the other directory.
-**Node scratch directory**: Use the [[#node-scratch|node scratch]] directory (/scratch) for temporary files. The job scheduler software (Grid Engine) creates a subdirectory in /scratch specifically for
-each job's temporary files. This is done on each node assigned to the job. When the job is complete, the
-subdirectory and its contents are deleted. This process automatically frees up the local scratch
-storage that others may need. Files in node scratch directories are not available to the head node, or other compute nodes.