abstract:caviness:filesystems:lustre

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
abstract:caviness:filesystems:lustre [2020-05-29 13:35] freyabstract:caviness:filesystems:lustre [2020-05-29 16:17] frey
Line 5: Line 5:
 Buying a better (and more expensive) disk is one way to improve i/o performance, but once the fastest, most expensive disk has been purchased this path leaves no room for further improvement.  The demands of an HPC cluster with several hundred (maybe even thousands) of compute nodes quickly outpaces the speed at which a single disk can shuttle bytes back and forth.  Parallelism saves the day:  store the filesystem blocks on more than one disk and the i/o performance of each will sum (to a degree).  For example, consider a computer that can move data to its hard disks in //1 cycle// with a hard disk that requires //4 cycles// to write a block.  Storing four blocks to just one hard disk would require 20 cycles: 1 cycle to move the block to the disk and 4 cycles to write it, with each block waiting on the completion of the previous: Buying a better (and more expensive) disk is one way to improve i/o performance, but once the fastest, most expensive disk has been purchased this path leaves no room for further improvement.  The demands of an HPC cluster with several hundred (maybe even thousands) of compute nodes quickly outpaces the speed at which a single disk can shuttle bytes back and forth.  Parallelism saves the day:  store the filesystem blocks on more than one disk and the i/o performance of each will sum (to a degree).  For example, consider a computer that can move data to its hard disks in //1 cycle// with a hard disk that requires //4 cycles// to write a block.  Storing four blocks to just one hard disk would require 20 cycles: 1 cycle to move the block to the disk and 4 cycles to write it, with each block waiting on the completion of the previous:
  
-{{ serial-vs-parallel.png?300 |Writing 4 blocks to (a) one disks and (b) four disks in parallel.}}+{{ :abstract:caviness:filesystems:serial-vs-parallel.png?300 |Writing 4 blocks to (a) one disks and (b) four disks in parallel.}}
  
 With four disks being used in parallel (example (b) above), the block writing overlaps and takes just 8 cycles to complete. With four disks being used in parallel (example (b) above), the block writing overlaps and takes just 8 cycles to complete.
Line 14: Line 14:
  
 The Caviness cluster contains multiple //Object Storage Targets// (OSTs) in each rack that each contain many hard disks.  For example, ''ost0'' contains 10 SATA hard disks (8 TB each, 1 hot spare) managed as a ZFS storage pool with an SSD acting as a read cache for improved performance: The Caviness cluster contains multiple //Object Storage Targets// (OSTs) in each rack that each contain many hard disks.  For example, ''ost0'' contains 10 SATA hard disks (8 TB each, 1 hot spare) managed as a ZFS storage pool with an SSD acting as a read cache for improved performance:
- 
 {{ :abstract:caviness:filesystems:caviness-lustre-oss_ost.png?400 |Example image of Caviness Lustre OSS/OST. }} {{ :abstract:caviness:filesystems:caviness-lustre-oss_ost.png?400 |Example image of Caviness Lustre OSS/OST. }}
  
Line 38: Line 37:
 For large files or files that are internally organized as "records((A //record// consists of a fixed-size sequence of bytes; the //i//-th record exists at an easily calculated offset within the file.))" i/o performance can be further improved by //striping// the file across multiple OSTs.  Striping divides a file into a set of sequential, fixed-size chunks.  The stripes are distributed round-robin to //N// unique Lustre objects -- and thus on //N// unique OSTs.  For example, consider a 13 MiB file: For large files or files that are internally organized as "records((A //record// consists of a fixed-size sequence of bytes; the //i//-th record exists at an easily calculated offset within the file.))" i/o performance can be further improved by //striping// the file across multiple OSTs.  Striping divides a file into a set of sequential, fixed-size chunks.  The stripes are distributed round-robin to //N// unique Lustre objects -- and thus on //N// unique OSTs.  For example, consider a 13 MiB file:
  
-{{ lustre-striping.png?500 |Lustre striping.}}+{{ :abstract:caviness:filesystems:caviness-lustre-striping.png?500 |Lustre striping.}}
  
 Without striping, all 13 MiB of the file resides in a single object on ''OST0001'' (see (a) above).  All i/o with respect to this file is handled by ''OSS1''; appending 5 MiB to the file will grow the object to 18 MiB. Without striping, all 13 MiB of the file resides in a single object on ''OST0001'' (see (a) above).  All i/o with respect to this file is handled by ''OSS1''; appending 5 MiB to the file will grow the object to 18 MiB.
  • abstract/caviness/filesystems/lustre.txt
  • Last modified: 2020-05-29 16:19
  • by frey