abstract:caviness:filesystems:lustre

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:caviness:filesystems:lustre [2020-05-29 13:35] freyabstract:caviness:filesystems:lustre [2020-05-29 16:19] (current) frey
Line 5: Line 5:
 Buying a better (and more expensive) disk is one way to improve i/o performance, but once the fastest, most expensive disk has been purchased this path leaves no room for further improvement.  The demands of an HPC cluster with several hundred (maybe even thousands) of compute nodes quickly outpaces the speed at which a single disk can shuttle bytes back and forth.  Parallelism saves the day:  store the filesystem blocks on more than one disk and the i/o performance of each will sum (to a degree).  For example, consider a computer that can move data to its hard disks in //1 cycle// with a hard disk that requires //4 cycles// to write a block.  Storing four blocks to just one hard disk would require 20 cycles: 1 cycle to move the block to the disk and 4 cycles to write it, with each block waiting on the completion of the previous: Buying a better (and more expensive) disk is one way to improve i/o performance, but once the fastest, most expensive disk has been purchased this path leaves no room for further improvement.  The demands of an HPC cluster with several hundred (maybe even thousands) of compute nodes quickly outpaces the speed at which a single disk can shuttle bytes back and forth.  Parallelism saves the day:  store the filesystem blocks on more than one disk and the i/o performance of each will sum (to a degree).  For example, consider a computer that can move data to its hard disks in //1 cycle// with a hard disk that requires //4 cycles// to write a block.  Storing four blocks to just one hard disk would require 20 cycles: 1 cycle to move the block to the disk and 4 cycles to write it, with each block waiting on the completion of the previous:
  
-{{ serial-vs-parallel.png?300 |Writing 4 blocks to (a) one disks and (b) four disks in parallel.}}+{{ :abstract:caviness:filesystems:serial-vs-parallel.png?300 |Writing 4 blocks to (a) one disks and (b) four disks in parallel.}}
  
 With four disks being used in parallel (example (b) above), the block writing overlaps and takes just 8 cycles to complete. With four disks being used in parallel (example (b) above), the block writing overlaps and takes just 8 cycles to complete.
Line 14: Line 14:
  
 The Caviness cluster contains multiple //Object Storage Targets// (OSTs) in each rack that each contain many hard disks.  For example, ''ost0'' contains 10 SATA hard disks (8 TB each, 1 hot spare) managed as a ZFS storage pool with an SSD acting as a read cache for improved performance: The Caviness cluster contains multiple //Object Storage Targets// (OSTs) in each rack that each contain many hard disks.  For example, ''ost0'' contains 10 SATA hard disks (8 TB each, 1 hot spare) managed as a ZFS storage pool with an SSD acting as a read cache for improved performance:
- 
 {{ :abstract:caviness:filesystems:caviness-lustre-oss_ost.png?400 |Example image of Caviness Lustre OSS/OST. }} {{ :abstract:caviness:filesystems:caviness-lustre-oss_ost.png?400 |Example image of Caviness Lustre OSS/OST. }}
  
Line 38: Line 37:
 For large files or files that are internally organized as "records((A //record// consists of a fixed-size sequence of bytes; the //i//-th record exists at an easily calculated offset within the file.))" i/o performance can be further improved by //striping// the file across multiple OSTs.  Striping divides a file into a set of sequential, fixed-size chunks.  The stripes are distributed round-robin to //N// unique Lustre objects -- and thus on //N// unique OSTs.  For example, consider a 13 MiB file: For large files or files that are internally organized as "records((A //record// consists of a fixed-size sequence of bytes; the //i//-th record exists at an easily calculated offset within the file.))" i/o performance can be further improved by //striping// the file across multiple OSTs.  Striping divides a file into a set of sequential, fixed-size chunks.  The stripes are distributed round-robin to //N// unique Lustre objects -- and thus on //N// unique OSTs.  For example, consider a 13 MiB file:
  
-{{ lustre-striping.png?500 |Lustre striping.}}+{{ :abstract:caviness:filesystems:caviness-lustre-striping.png?500 |Lustre striping.}}
  
-Without striping, all 13 MiB of the file resides in a single object on ''OST0001'' (see (a) above).  All i/o with respect to this file is handled by ''OSS1''; appending 5 MiB to the file will grow the object to 18 MiB.+Without striping, all 13 MiB of the file resides in a single object on ''ost0'' (see (a) above).  All i/o with respect to this file is handled by ''oss0''; appending 5 MiB to the file will grow the object to 18 MiB.
  
-With a stripe count of three and size of 4 MiB, the Lustre filesystem pre-allocates three objects on unique OSTs on behalf of the file (see (b) above).  The file is split into sequential segments of 4 MiB -- a stripe -- and the stripes are written round-robin to the objects allocated to the file.  In this case, appending 5 MiB to the file will see stripe 3 extended to a full 4 MiB and a new stripe of 2 MiB added to the object on ''OST0007'' For large files and record-style files, striping introduces another level of parallelism that can dramatically increase the performance of programs that access them.+With a stripe count of three and size of 4 MiB, the Lustre filesystem pre-allocates three objects on unique OSTs on behalf of the file (see (b) above).  The file is split into sequential segments of 4 MiB -- a stripe -- and the stripes are written round-robin to the objects allocated to the file.  In this case, appending 5 MiB to the file will see stripe 3 extended to a full 4 MiB and a new stripe of 2 MiB added to the object on ''ost1'' For large files and record-style files, striping introduces another level of parallelism that can dramatically increase the performance of programs that access them.
  
 <note tip>File striping is established when the file is created.  Use the ''lfs setstripe'' command to pre-allocate the objects for a striped file:  ''lfs setstripe -c 4 -s 8m my_new_file.nc'' would create the file ''my_new_file.nc'' containing zero bytes with a stripe size (-s) of 8 MiB and striped across four objects (-c).</note> <note tip>File striping is established when the file is created.  Use the ''lfs setstripe'' command to pre-allocate the objects for a striped file:  ''lfs setstripe -c 4 -s 8m my_new_file.nc'' would create the file ''my_new_file.nc'' containing zero bytes with a stripe size (-s) of 8 MiB and striped across four objects (-c).</note>
  • abstract/caviness/filesystems/lustre.1590773759.txt.gz
  • Last modified: 2020-05-29 13:35
  • by frey