technical:generic:caviness-lustre-rebalance

This is an old revision of the document!


Caviness 2021 Lustre Expansion

Throughout 2020 and into early 2021, usage of the Lustre file system on the Caviness cluster has maintained a level around 80% of total capacity. At this level of usage the performance of the file system begins to suffer. In each instance this has necessitated an email campaign directed at all cluster users, asking that they remove unneeded files. Though cleanup has been effected by the users each time, usage has always afterward steadily increased again until the 80% threshold is exceeded. As of early 2021, the frequency of these occurrences has increased.

The capacity of a Lustre file system embodies two separate metrics (storage classes):

  • The total metadata entries (inodes) provided by metadata target (MDT) devices
  • The total object storage (e.g. bytes or blocks) provided by object storage target (OST) devices

Having extremely large OST capacity combined with insufficient MDT capacity leads to an inability to create additional files despite their being many bytes of object storage available. A similar scenario exists for extraneous MDT capacity over a lack of object storage capacity. Thus, a critical element in provisioning Lustre file systems is balancing the two types of storage so that usage fluctuates at about the same rate.

On Caviness, the existing MDT and OST capacity are being consumed at nearly the same rate. As of February 23, 2021:

  • OST usage at 83%
  • MDT usage at 77%

This is actually good news: it implies a fair balance between the two storage classes under the usage profile of all Caviness users. Planning for addition of capacity can be guided by the existing sizing.

Part of the Generation 2 addition to the Caviness cluster was:

  • (2) OSTs, each 120 TB in size
  • (1) MDT, 16 TB in size

The previous components of the Lustre file system were:

  • (4) OSTs, each 65 TB in size
  • (1) MDT, 4 TB in size

The additions will nearly double the capacity of the Lustre file system.

Bringing the new capacity online will require downtime, primarily because the existing MDT and OST usage levels are so high. Every directory currently present on the Lustre filesystem only makes use of the existing MDT (MDT0000). Adding the 16 TB MDT0001 to the file system does not effect any change in where metadata is being stored. Metadata striping only takes effect on Lustre directories that are explicitly changed to use both MDT0000 and MDT0001. Even so, every file and directory is mapped to one of the MDTs based on its name1).


1)
The filename is hashed using a 64-bit FNV-1 function, and the hash modulus the number of MDTs (2 in this case) provides the MDT index.
  • technical/generic/caviness-lustre-rebalance.1614103117.txt.gz
  • Last modified: 2021-02-23 12:58
  • by frey