abstract:farber:filesystems:lustre

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:farber:filesystems:lustre [2020-03-06 11:20] – [A Storage Node] mkyleabstract:farber:filesystems:lustre [2020-05-29 11:43] (current) frey
Line 14: Line 14:
 ===== A Storage Node ===== ===== A Storage Node =====
  
-For example, Mills cluster contains five //storage appliances// that each contain many hard disks.  For example, ''storage1'' contains 36 SATA hard disks (2 TB each) arranged as six 8 TB RAID-6 units:+The Farber cluster contains five //storage appliances// that each contain many hard disks.  For example, ''storage1'' contains 36 SATA hard disks (2 TB each) arranged as six 8 TB RAID-6 units:
  
-{{ osts.png |The Mills storage1 appliance.}}+{{ osts.png |The Farber storage1 appliance.}}
  
 Each of the six OST (Object Storage Target) units can survive the concurrent failure of one or two hard disks at the expense of storage space:  the raw capacity of ''storage1'' is 72 TB, but the data resilience afforded by RAID-6 costs a full third of that capacity (leaving 48 TB). Each of the six OST (Object Storage Target) units can survive the concurrent failure of one or two hard disks at the expense of storage space:  the raw capacity of ''storage1'' is 72 TB, but the data resilience afforded by RAID-6 costs a full third of that capacity (leaving 48 TB).
Line 40: Line 40:
   * File system capacity is not limited by hard disk size   * File system capacity is not limited by hard disk size
  
-The capacity of a Lustre filesystem is the sum of its constituent OSTs, so a Lustre filesystem's capacity can be grown by the addition of OSTs (and possibly OSSs to serve them).  For example, should the 172 TB Lustre filesystem on Mills begin to approach its capacity, additional capacity could be added with zero downtime by buying and installing another OSS pair.+The capacity of a Lustre filesystem is the sum of its constituent OSTs, so a Lustre filesystem's capacity can be grown by the addition of OSTs (and possibly OSSs to serve them).  For example, should the 172 TB Lustre filesystem begins to reach its capacity, additional capacity could be added with zero downtime by buying and installing another OSS pair.
  
 <note important>Creating extremely large filesystems has one drawback:  traversing the filesystem takes so much time that it becomes impossible to create off-site backups for further data resilience.  For this reason Lustre filesystems are most often treated as volatile/scratch storage.</note> <note important>Creating extremely large filesystems has one drawback:  traversing the filesystem takes so much time that it becomes impossible to create off-site backups for further data resilience.  For this reason Lustre filesystems are most often treated as volatile/scratch storage.</note>
Line 71: Line 71:
  
 <code> <code>
-[traine@mills ~]$ lrm+[traine@farber ~]$ lrm
 usage: usage:
  
Line 105: Line 105:
 </code> </code>
  
-The example below shows user ''traine'' in workgroup ''it_nss'' on compute node ''n012'' removing ''/lustre/work/it_nss/projects/namd'' directory and all files and subdirectories using the ''%%--%%recursive'' option. The additional option ''%%--%%summary'' is also used to display how much space was freed in bytes. Note ''traine'' was already in ''/lustre/work/it_nss/projects'' before using ''qlogin'' to login into the compute node ''n012''.+The example below shows user ''traine'' in workgroup ''it_nss'' on compute node ''n012'' removing ''/lustre/scratch/traine/projects/namd'' directory and all files and subdirectories using the ''%%--%%recursive'' option. The additional option ''%%--%%summary'' is also used to display how much space was freed in bytes. Note ''traine'' was already in ''/lustre/scratch/traine/projects'' before using ''qlogin'' to login into the compute node ''n012''.
  
 <code> <code>
-[(it_nss:traine)@mills projects]$ qlogin+[(it_nss:traine)@farber projects]$ qlogin
 Your job 369292 ("QLOGIN") has been submitted Your job 369292 ("QLOGIN") has been submitted
 waiting for interactive job to be scheduled ... waiting for interactive job to be scheduled ...
 Your interactive job 369292 has been successfully scheduled. Your interactive job 369292 has been successfully scheduled.
 Establishing /opt/shared/OpenGridScheduler/local/qlogin_ssh session to host n012 ... Establishing /opt/shared/OpenGridScheduler/local/qlogin_ssh session to host n012 ...
-Last login: Thu Aug 22 14:32:16 2013 from mills.mills.hpc.udel.edu+Last login: Thu Aug 22 14:32:16 2013 from login000
  
 [traine@n012 projects]$ pwd [traine@n012 projects]$ pwd
-/lustre/work/it_nss/projects+/lustre/scratch/traine/projects
  
 [traine@n012 projects]$ lrm --summary --recursive --stat-limit 100 --unlink-limit 100 ./namd [traine@n012 projects]$ lrm --summary --recursive --stat-limit 100 --unlink-limit 100 ./namd
Line 129: Line 129:
  
 <code> <code>
-[traine@mills ~]$ ldu +[traine@farber ~]$ ldu 
 usage: usage:
  
Line 151: Line 151:
 </code> </code>
  
-The example below shows user ''traine'' in workgroup ''it_nss'' on compute node ''n012'' summarizing their disk usage on ''/lustre/work/it_nss/projects'' directory. Note ''traine'' was already in ''/lustre/work/it_nss/projects'' before using ''qlogin'' to login into compute node ''n012''.+The example below shows user ''traine'' in workgroup ''it_nss'' on compute node ''n012'' summarizing their disk usage on ''/lustre/scratch/traine/projects'' directory. Note ''traine'' was already in ''/lustre/scratch/traine/projects'' before using ''qlogin'' to login into compute node ''n012''.
  
 <code> <code>
-[(it_nss:traine)@mills projects]$ qlogin+[(it_nss:traine)@farber projects]$ qlogin
 Your job 369292 ("QLOGIN") has been submitted Your job 369292 ("QLOGIN") has been submitted
 waiting for interactive job to be scheduled ... waiting for interactive job to be scheduled ...
 Your interactive job 369292 has been successfully scheduled. Your interactive job 369292 has been successfully scheduled.
 Establishing /opt/shared/OpenGridScheduler/local/qlogin_ssh session to host n012 ... Establishing /opt/shared/OpenGridScheduler/local/qlogin_ssh session to host n012 ...
-Last login: Thu Aug 22 14:32:16 2013 from mills.mills.hpc.udel.edu+Last login: Thu Aug 22 14:32:16 2013 from login000
  
 [traine@n012 projects]$ pwd [traine@n012 projects]$ pwd
-/lustre/work/it_nss/projects+/lustre/scratch/traine/projects
  
 [traine@n012 projects]$ ldu --human-readable --stat-limit 100 ./ [traine@n012 projects]$ ldu --human-readable --stat-limit 100 ./
Line 169: Line 169:
 [2013-08-22 13:49:19-0400] leon_stat:  25838 calls over 183.257 seconds (141 calls/sec) [2013-08-22 13:49:19-0400] leon_stat:  25838 calls over 183.257 seconds (141 calls/sec)
 [2013-08-22 13:50:43-0400] leon_stat:  26778 calls over 266.790 seconds (100 calls/sec) [2013-08-22 13:50:43-0400] leon_stat:  26778 calls over 266.790 seconds (100 calls/sec)
-821.07 GiB    /lustre/work/it_nss/projects+821.07 GiB    /lustre/scratch/traine/projects
 </code> </code>
  
  • abstract/farber/filesystems/lustre.1583511621.txt.gz
  • Last modified: 2020-03-06 11:20
  • by mkyle