====== Lustre MPI-IO in Open MPI ======
In the Lustre filesystem, striping behavior can only be set on a file before it is created. Attempting to modify the striping of an extant file will fail. There are two approaches to using striped Lustre files with the Open MPI implementation of ROMIO:
- Create the file from the shell before your program executes
- Use ''MPI_Info'' parameters passed to the MPI-IO ''MPI_File_open()'' function inside your program
===== Pre-Create the File =====
An empty, striped file can be created using the ''lfs setstripe'' command from the shell:
$ lfs setstripe --size 4M --count 2 scratch.txt
$ ls -l scratch.txt
-rw-r--r-- 1 traine it_css 0 Sep 11 12:33 scratch.txt
Your code cannot require the ''MPI_MODE_EXCL'' mode in ''MPI_File_open()'' -- the file will be present when the program executes.
===== MPI_Info Parameters =====
The Open MPI Lustre ADIO module recognizes two info keys w.r.t. striping:
^Key ^Value ^
|''striping_factor''|The number of OSTs across which the file is striped.|
|''striping_unit''|The file is broken into chunks of this many bytes.|
Construct an ''MPI_Info'' entity containing the striping properties for the file:
MPI_Info scrfinfo;
MPI_Info_create(&scrfinfo);
/* How many OSTs to stripe across? */
MPI_Info_set(scrfinfo, "striping_factor", "4");
/* How many bytes per stripe? */
MPI_Info_set(scrfinfo, "striping_unit", "65536");
This ''MPI_Info'' is then passed to the ''MPI_File_open()'' function. Beware: you cannot use the ''MPI_MODE_EXCL'' flag in your call to ''MPI_File_open()'' even though the file is not present on the filesystem! The Lustre ADIO module exploits the fact that its "set info" callback is executed before its "open file" callback: if the ''MPI_Info'' contains any striping properties, the "set info" callback declares the new file (using a special Lustre flag to the ''open()'' system function) and sets the striping properties using ''ioctl()'' (which then actually commits the new file to the Lustre MDS). If successful, "set info" closes the file descriptor which allows the "open file" callback to itself use the ''open()'' system function to prepare the file for i/o. The "set info" callback will respect the ''MPI_MODE_EXCL'' flag, but the "open file" callback will subsequently also require that the file not be present, and will fail.
This behavior is obviously incorrect; the Lustre ADIO module should be modified such that the ''ADIOI_LUSTRE_SetInfo()'' function removes the ''MPI_MODE_EXCL'' requirement if it succeeds in creating the striped file. A bug will be filed with the Open MPI folks, so this behavior may be remedied in the future.
Here is an example function for creating a striped Lustre file that will use MPI-IO:
#include
#include
#include "mpi.h"
MPI_File
NSSCreateLustreFile(
MPI_Comm comm,
const char* path,
int stripeCount,
size_t stripeSize,
int *errorCode
)
{
struct stat fInfo;
if ( stat(path, &fInfo) != 0 ) {
int rc = 0;
MPI_File fh = NULL;
MPI_Info finfo = NULL;
if ( (rc = MPI_Info_create(&finfo)) == 0 ) {
char strForm[32];
if ( stripeCount < 0 ) stripeCount = -1;
snprintf(strForm, sizeof(strForm), "%d", stripeCount);
if ( (rc = MPI_Info_set(finfo, "striping_factor", strForm)) == 0 ) {
if ( stripeSize < 0 ) stripeSize = 0;
snprintf(strForm, sizeof(strForm), "%ld", stripeSize);
if ( (rc = MPI_Info_set(finfo, "striping_unit", strForm)) == 0 ) {
rc = MPI_File_open(
comm,
(char*)path,
MPI_MODE_RDWR | MPI_MODE_CREATE,
finfo,
&fh
);
}
}
MPI_Info_free(&finfo);
}
if ( rc ) {
if ( errorCode ) *errorCode = rc;
return NULL;
}
return fh;
} else {
if ( errorCode ) *errorCode = EEXIST;
}
return NULL;
}
Were this function called (successfully) with the following arguments
:
scratchFile = NSSCreateLustreFile(MPI_COMM_WORLD, "mpibounce.scr", 4, 65536, NULL);
:
the success of the striping is evident via ''lfs getstripe'':
$ lfs getstripe mpibounce.scr
mpibounce.scr
lmm_stripe_count: 4
lmm_stripe_size: 65536
lmm_stripe_offset: 17
obdidx objid objid group
17 20020417 0x1317cc1 0
23 19758316 0x12d7cec 0
0 19895589 0x12f9525 0
6 19804152 0x12e2ff8 0