technical:generic:caviness-lustre-rebalance

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
technical:generic:caviness-lustre-rebalance [2021-02-23 16:48] freytechnical:generic:caviness-lustre-rebalance [2021-02-23 17:13] (current) – [Testing] frey
Line 67: Line 67:
   - Finally, all directories under ''/lustre/scratch/altroot'' will be moved back to being under ''/lustre/scratch'' as before.   - Finally, all directories under ''/lustre/scratch/altroot'' will be moved back to being under ''/lustre/scratch'' as before.
 With the metadata of the new copies being striped across all MDTs, and the Lustre metadata subsystem spreading the copies across the new and old OSTs, the net effect will be to rebalance MDT and OST usage across all devices. With the metadata of the new copies being striped across all MDTs, and the Lustre metadata subsystem spreading the copies across the new and old OSTs, the net effect will be to rebalance MDT and OST usage across all devices.
 +
 +===== Testing =====
 +
 +All aspects of this workflow were tested using VirtualBox on a Mac laptop.  A CentOS 7 VM (of the same version as is in-use on Caviness) was provisioned with Lustre 2.10.3 patchless server kernel modules installed.  This VM was diff-cloned to create three additional VMs: mds0, mds1, oss1, oss2.
 +
 +The four VMs each had a virtual NIC configured in a named internal network (''lustre-net'') and IP addresses were assigned manually in the OS.  Connectivity between the four VMs via that network was confirmed.  LNET was configured manually after boot on each node:
 +<code bash>
 +[mds0 ~]$ modprobe lnet
 +[mds0 ~]$ lnetctl net configure --all
 +</code>
 +
 +The following VDIs were created:
 +  * 50 GB - mgt
 +  * 250 GB - mdt0, mdt1
 +  * 1000 GB - ost0, ost1
 +The mgt and mdt0 VDIs were attached to mds0 and formatted:
 +<code bash>
 +[mds0 ~]$ mkfs.lustre --mgs --reformat \
 +    --servicenode=mds0@tcp --mgsnode=mds1@tcp \
 +    --backfstype=ldiskfs \
 +    /dev/sdb
 +[mds0 ~]$ mkfs.lustre --mdt --reformat \
 +    --mgsnode=mds0@tcp --mgsnode=mds1@tcp \
 +    --servicenode=mds0@tcp --mgsnode=mds1@tcp \
 +    --backfstype=ldiskfs --fsname=demo \
 +    /dev/sdc
 +</code>
 +The ost0 VDI was attached to oss0 and formatted:
 +<code bash>
 +[oss0 ~]$ mkfs.lustre --ost --reformat --index=0 \
 +    --mgsnode=mds0@tcp --mgsnode=mds1@tcp \
 +    --servicenode=oss0@tcp --mgsnode=oss1@tcp \
 +    --backfstype=ldiskfs --fsname=demo \
 +    /dev/sdb
 +</code>
 +The mgt and mdt0 were brought online:
 +<code bash>
 +[mds0 ~]$ mkdir -p /lustre/mgt /lustre/mdt{0,1}
 +[mds0 ~]$ mount -t lustre /dev/sdb /lustre/mgt
 +[mds0 ~]$ mount -t lustre /dev/sdc /lustre/mdt0
 +</code>
 +Finally, ost0 was brought online:
 +<code bash>
 +[oss0 ~]$ mkdir -p /lustre/ost{0,1}
 +[oss0 ~]$ mount -t lustre /dev/sdb /lustre/ost0
 +</code>
 +
 +==== Client Setup ====
 +
 +Another VM was created with the same version of CentOS 7 and the Lustre 2.10.3 client modules.  The VM also had a virtual NIC created as part of the named internal network (''lustre-net'') and an IP address assigned manually within the OS.  Connectivity to the four Lustre VMs was confirmed and LNET configured manually as above.
 +
 +The "demo" Lustre file system was mounted on the client:
 +<code bash>
 +[client ~]$ mkdir /demo
 +[client ~]$ mount -t lustre mdt0@tcp:mdt1@tcp:/demo /demo
 +</code>
 +
 +At this point, some tests were performed in order to fill the metadata to approximately 70% of capacity.
 +
 +==== Addition of MDT ====
 +
 +The new MDT was formatted and brought online:
 +<code bash>
 +[mds1 ~]$ mkfs.lustre --mdt --reformat --index=1 \
 +    --mgsnode=mds0@tcp --mgsnode=mds1@tcp \
 +    --servicenode=mds1@tcp --mgsnode=mds0@tcp \
 +    --backfstype=ldiskfs --fsname=demo \
 +    /dev/sdb
 +[mds1 ~]$ mkdir -p /lustre/mgt /lustre/mdt{0,1}
 +[mds1 ~]$ mount -t lustre /dev/sdb /lustre/mdt1
 +</code>
 +
 +After a few moments, the client VM received the updated file system configuration and had mounted the new MDT.  MDT usage and capacity changed accordingly.  **//This indicated that an online addition of MDTs to a running Lustre file system is possible.//**
 +
 +Further testing was performed to confirm that
 +  * by default all metadata additions were against mdt0
 +  * creating a new directory with metadata striping over mdt0 and mdt1 initially allowed a balanced creation of new files across both MDTs
 +  * once mdt0 was filled to capacity, creation of new files whose name hashed and mapped to mdt0 failed; names that hashed and mapped to mdt1 succeeded
 +
 +==== Addition of OST ====
 +
 +The new OST was formatted and brought online:
 +<code bash>
 +[mds1 ~]$ mkfs.lustre --ost --reformat --index=1 \
 +    --mgsnode=mds0@tcp --mgsnode=mds1@tcp \
 +    --servicenode=oss1@tcp --mgsnode=oss0@tcp \
 +    --backfstype=ldiskfs --fsname=demo \
 +    /dev/sdb
 +[oss1 ~]$ mkdir -p /lustre/ost{0,1}
 +[oss1 ~]$ mount -t lustre /dev/sdb /lustre/ost1
 +</code>
 +
 +After a few moments, the client VM received the updated file system configuration and had mounted the new OST.  OST usage and capacity changed accordingly.  **//This indicated that an online addition of OSTs to a running Lustre file system is possible.//**
 +
  • technical/generic/caviness-lustre-rebalance.1614116894.txt.gz
  • Last modified: 2021-02-23 16:48
  • by frey