Building and using TELEMAC-MASCARET with VALET integration
TELEMAC-MASCARET is an integrated suite of solvers for use in the field of free-surface flow. Having been used in the context of many studies throughout the world, it has become one of the major standards in its field.
The text above appears at the top of the TELEMAC-MASCARET web site. In all that time this product was becoming one of the major standards in its field they don't seem to have made it that easy to setup, build, or actually use. The official installation documentation has some distinctly problematic errors in it with respect to deployment on HPC systems.
This document describes a tested setup and build procedure for TELEMAC-MASCARET (to be referred to as just TELEMAC henceforth) that manages the software with the VALET environment tool – though the same procedures should likely work just as well with Modules, LMod, or any of the other environment-management systems used on HPC systems.
Build
Before starting, make sure you are in your workgroup (e.g. workgroup -g «investing-entity»
) as these instructions depend on environment variables set when you use this command.
Please note that the procedures herein will work perfectly well if not installing in your workgroup's storage. The $WORKDIR
base directory could just as easily be $HOME
to install in the user's home directory.
Various versions of TELEMAC will coexist under a single parent directory. Using the recommended approach to software management on our HPC systems, a telemac
directory will be created in the standard workgroup software directory:
$ mkdir -p "$WORKDIR/sw/telemac" $ TELEMAC_PREFIX="$WORKDIR/sw/telemac/v8p2r0"
When this procedure was written, the newest tagged-release of TELEMAC was v8p2r0, so the version being built will be given the version id v8p2r0
– meaning it will be installed in $WORKDIR/sw/telemac/v8p2r0
. The TELEMAC_PREFIX
environment variable is used throughout the build process to avoid retyping that path multiple times (but is not needed when running TELEMAC programs).
Python setup
The latest versions of TELEMAC make heavy use of Python scripts. There are various Python packages required by different components, but at the most basic level numpy, scipy, and matplotlib are necessary. The Python requirements will be satisfied using a virtual environment.
$ vpkg_require intel-python/2020u2:python3 (base) $ conda create --prefix="$TELEMAC_PREFIX" python=3 numpy=1.17 scipy=1.3 pip matplotlib=3
Ignore the instructions regarding activating the new environment. Note also that the prompt has changed to include the prefix (base)
, indicating a Python virtual environment is active – the base environment that comprises the Intel Python package.
Download source
The official Subversion TELEMAC source repository is enormous: the documentation
, examples
, and notebooks
subdirectories account for most of this, and are not required. Especially when building multiple variants of a TELEMAC release (e.g. different compilers, code changes) it's a good idea to avoid duplicating those large subdirectories among each source instance. To that end, the procedure that follows avoids checking-out those (massive!) subdirectories.
(base) $ cd "$TELEMAC_PREFIX" (base) $ svn co http://svn.opentelemac.org/svn/opentelemac/tags/v8p2r0 --username=ot-svn-public --password='telemac1*' --non-interactive --depth=immediates src (base) $ cd src (base) $ for item in builds configs scripts sources; do \ svn co http://svn.opentelemac.org/svn/opentelemac/tags/v8p2r0/${item} --username=ot-svn-public --password='telemac1*' --non-interactive --depth=infinity done
Build METIS
The METIS source must be downloaded, built, and installed as part of TELEMAC.
(base) $ cd "$TELEMAC_PREFIX/src/optionals" (base) $ wget 'http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz' (base) $ tar -xf metis-5.1.0.tar.gz (base) $ cd metis-5.1.0
The only pre-requisite for METIS is the compiler that will be used to build the rest of TELEMAC. The TELEMAC build will include Open MPI parallelism with the Intel compiler suite underlying it:
(base) $ vpkg_require openmpi/4.0.2:intel
METIS 5.1.0 has a CMake build system present:
(base) $ mkdir build-20210127 ; cd build-20210127 (base) $ CC=icc CXX=icpc FC=ifort cmake -DCMAKE_INSTALL_PREFIX="$TELEMAC_PREFIX" -DCMAKE_BUILD_TYPE=Release -DSHARED=TRUE -DGKLIB_PATH=$(pwd)/../GKlib .. (base) $ make -j 4 (base) $ make install
With METIS installed to the base directory ($TELEMAC_PREFIX
) there's a single parent directory for all dependencies installed as a part of this TELEMAC version. In other words, referencing $TELEMAC_PREFIX/include
for headers and $TELEMAC_PREFIX/lib
for shared libraries is adequate. This installation prescription should be applied to all dependencies integrated with the build and not provided externally (e.g. AED2, MED, MUMPS).
TELEMAC config
The next step is critical to the build and usage of this copy of TELEMAC. The systel.cfg
file is an INI-style file that describes the compilers, flags, options, et al. associated with one or more builds of TELEMAC inside this source tree.
The config file presented here contains two configurations: batch and interactive. They will both end up referring to the same (single) build of the TELEMAC libraries and executables, with the major difference being batch executes all programs via the Slurm scheduler and interactive runs programs in the current shell. First, an apt location for the systel.cfg
file is needed:
(base) $ mkdir "${TELEMAC_PREFIX}/etc" (base) $ cat <<EOT > "${TELEMAC_PREFIX}/etc/systel.cfg" ## ## Declare what named configurations are present ## [Configurations] configs: interactive batch ## ## Baseline options that named configurations can override ## [general] modules: system version: v8p2 options: dyn mpi hash_char: # # language: 1=French, 2=English language: 2 mods_all: -I <config> sfx_zip: .tar.gz sfx_lib: .so sfx_obj: .o sfx_mod: .mod sfx_exe: val_root: <root>/examples # possible val_rank: all <3 >7 6 val_rank: all # C compiler and flags: cc: icc cflags: -fPIC -xHost -O3 # Fortran compiler and flags: fc: mpifort fflags: -fpp -fPIC -convert big_endian -xHost -O3 -DHAVE_MPI # command to compile C source to a object code: cmd_obj_c: [cc] [cflags] -c <srcName> -o <objName> -fPIC cmd_obj: [fc] [fflags] -c <mods> <incs> <f95name> cmd_lib: [fc] [fflags] -shared -o <libname> <objs> cmd_exe: [fc] [fflags] -o <exename> <objs> <libs> # METIS: inc_metis: -I\$METISHOME/include libs_metis: -L\$METISHOME/lib -lmetis # Overall library flags: incs_all: [inc_metis] libs_all: [libs_metis] ## ## Deviations from [general] for our configurations: ## [interactive] brief: interactive mode for compilation, etc. [batch] brief: job submission mode options: hpc # Running the "split" parts of the work: par_cmdexec: <config>/partel < <partel.par> >> <partel.log> # Running an MPI program: mpi_cmdexec: mpirun <exename> # HPC job template: hpc_stdin: [hash_char]!/bin/bash -i [hash_char]SBATCH --job-name=<jobname> [hash_char]SBATCH --output=<jobname>-<time>.out [hash_char]SBATCH --error=<jobname>-<time>.err [hash_char]SBATCH --time=<walltime> [hash_char]SBATCH --nodes=<ncnode> [hash_char]SBATCH --ntasks-per-node=<nctile> [hash_char]SBATCH --partition=<queue> [hash_char]SBATCH --export=NONE vpkg_require telemac/v8p2r0:batch . /opt/shared/slurm/templates/libexec/openmpi.sh <py_runcode> mpi_rc=\$? exit \$? # HPC job submission: hpc_runcode: cp HPC_STDIN ../;cd ../;sbatch < <hpc_stdin> EOT
One major deviation should be evident in the hpc_stdin
shown here versus on the TELEMAC web site. Python's configparser.RawConfigParser class handles multiline strings by removing comment lines. Since every key must have a value, if the first line starts with a #
it must not be considered a comment. So:
hpc_stdin: #!/bin/bash #SBATCH --option1 #SBATCH --option2 #SBATCH --option3 mpirun help-me
is parsed as:
hpc_stdin = "#!/bin/bash\nmpirun help-me"
In other words, the batch script template cannot have comments, thus it can have no embedded arguments for the majority of schedulers (PBS, Slurm, SGE). The solution to this problem is to reference a variable that will be replaced with the #
character after parsing. Thus:
[general] : hash_char: # : hpc_stdin: [hash_char]!/bin/bash [hash_char]SBATCH --option1 [hash_char]SBATCH --option2 [hash_char]SBATCH --option3 mpirun help-me
which after parsing will be
hpc_stdin = "#!/bin/bash\n#SBATCH --option1\n#SBATCH --option2\n#SBATCH --option2\nmpirun help-me"
Hooray!
VALET package definition
Representing this version of TELEMAC in VALET requires several transformations to the environment:
- Addition of
$TELEMAC_PREFIX/lib
for library search by the dynamic linker - Activation of the Python virtual environment
- Setting of the TELEMAC environment variables (e.g.
SYSTELCFG
,USETELCFG
) - Addition of TELEMAC Python library to the
PYTHONPATH
The following command creates a new package definition file for this version of TELEMAC; if the telemac.vpkg_yaml
file already exists and a new version is being added, edit the file to add the new version blocks rather than using the following command (it will overwrite the file).
If the installation is not meant to be available to all workgroup users (e.g. installed to $HOME
rather than $WORKDIR
) the VALET package definition should be in ~/.valet/telemac.vpkg_yaml
.
(base) $ mkdir --mode=2770 --parents "$WORKDIR/sw/valet" (base) $ cat <<EOT > "$WORKDIR/sw/valet/telemac.vpkg_yaml" telemac: prefix: $(dirname "${TELEMAC_PREFIX}") actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 - bindir: \${VALET_PATH_PREFIX}/src/scripts/python3 - variable: HOMETEL value: \${VALET_PATH_PREFIX}/src action: set - variable: SYSTELCFG value: \${VALET_PATH_PREFIX}/etc/systel.cfg action: set - variable: PYTHONUNBUFFERED value: true action: set - variable: PYTHONPATH value: \${VALET_PATH_PREFIX}/src/scripts/python3 action: prepend-path - variable: METISHOME value: \${VALET_PATH_PREFIX} action: set versions: "v8p2r0:interactive": prefix: v8p2r0 dependencies: - openmpi/4.0.2:intel - intel-python/2020u2:python3 actions: - variable: USETELCFG value: interactive action: set - libdir: \${VALET_PATH_PREFIX}/src/builds/interactive/wrap_api/lib - variable: PYTHONPATH value: \${VALET_PATH_PREFIX}/src/builds/interactive/wrap_api/lib action: prepend-path "v8p2r0:batch": prefix: v8p2r0 dependencies: - openmpi/4.0.2:intel - intel-python/2020u2:python3 actions: - variable: USETELCFG value: batch action: set - libdir: \${VALET_PATH_PREFIX}/src/builds/batch/wrap_api/lib - variable: PYTHONPATH value: \${VALET_PATH_PREFIX}/src/builds/batch/wrap_api/lib action: prepend-path EOT
Two variants of v8p2r0
are defined in the package, corresponding to the interactive and batch configurations in systel.cfg
. The Intel Python and Open MPI packages are properly-mentioned as dependencies, the virtual environment is automatically activated, and all necessary changes to environment variables are present.
Build TELEMAC
Finally, the TELEMAC libraries and executables can be built! First, the environment must be prepared by rolling-back to the original shell then adding our new telemac/v8p2r0:batch
package:
(base) $ vpkg_rollback all $ vpkg_require telemac/v8p2r0:batch Adding dependency `intel/2019u5` to your environment Adding dependency `libfabric/1.9.0` to your environment Adding dependency `openmpi/4.0.2:intel` to your environment Adding dependency `intel-python/2020u2:python3` to your environment Adding package `telemac/v8p2r0:batch` to your environment (/work/workgroup/sw/telemac/v8p2r0) $
Note that the prompt has changed again, this time showing that the TELEMAC virtual environment created with the conda
command above is active.
The TELEMAC software is built with two commands:
(/work/workgroup/sw/telemac/v8p2r0) $ config.py (/work/workgroup/sw/telemac/v8p2r0) $ compile_telemac.py -j 4
The compile may take quite some time. If the commands are successful, then the interactive build is created as a symbolic link to the batch build:
(/work/workgroup/sw/telemac/v8p2r0) $ ln -s batch "$HOMETEL/builds/interactive"
Think of the interactive build as a "light" copy of or an alias to the batch build.
Any recompilation of the TELEMAC libraries and executables should be done using the batch build.
All done!
That's it. TELEMAC is now built, easily added to a shell using VALET, and ready for work.
Test job
The following test job was furnished by a Caviness user. It uses a set of three Fortran source files implementing a user model.
A job comprises four distinct steps:
- Partition the problem according to the parallelism options
- Number of nodes
- Number of MPI processes per node
- Compile user sources into an executable
- This step may (must) be done of the login node serially
- Run the executable
- Same MPI parameters as in step 1
- Merge the partitioned results
- Same MPI parameters as in step 1
We set up a test job directory called $WORKDIR/sw/telemac/example
with our steering file, input files, and source code in this directory. The job will use a working directory inside our test job directory for all of the intermediate files during the run. Also the standard
partition is specified as the queue for the batch steps (those that can run on compute nodes). Remember jobs can be preempted when using the standard
partition, so you may want to use your workgroup partition for longer running jobs.
Partitioning
Partitioning is done in parallel, and thus should use a compute node and the job scheduler. This implies the telemac/v8p2r0:batch
package in the environment:
$ cd "$WORKDIR/sw/telemac/example" $ vpkg_require telemac/v8p2r0:batch (/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \ --use-link --split --workdirectory $(pwd)/test_run \ --jobname telemac_test-split \ --ncsize 2 \ --nctile 2 \ --ncnode 1 \ --walltime 1-00:00:00 \ --queue standard \ telemac2d -s t2d_bump_FE.cas
A job will be submitted to the job scheduler and the grid et al. will be partitioned into two (2) domains – since the ncsize = ncnode * nctile is 21).
Compilation
Once the problem is partitioned, the executable can be built. This step should not be run on compute nodes since they lack a base development environment in the OS2).
(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback $ vpkg_require telemac/v8p2r0:interactive (/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \ --use-link --compileonly --workdirectory $(pwd)/test_run \ telemac2d -s t2d_bump_FE.cas
Run
With the problem partitioned and the executable built successfully, the job can be run. Again, this step uses the batch build.
(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback $ vpkg_require telemac/v8p2r0:batch (/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \ --use-link --run --workdirectory $(pwd)/test_run \ --jobname telemac_test-run \ --ncsize 2 \ --nctile 2 \ --ncnode 1 \ --walltime 1-00:00:00 \ --queue standard \ telemac2d -s t2d_bump_FE.cas
Merge
If the run is successful, the partitioned results must be merged. This step also uses the batch build.
(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback $ vpkg_require telemac/v8p2r0:batch (/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \ --use-link --merge --workdirectory $(pwd)/test_run \ --jobname telemac_test-merge \ --ncsize 2 \ --nctile 2 \ --ncnode 1 \ --walltime 1-00:00:00 \ --queue standard \ telemac2d -s t2d_bump_FE.cas
Single-step runs
Clusters that install a full base development environment on all compute nodes do not need to split the run detailed above into 4 steps. Thus, the following is probably permissible:
$ vpkg_require telemac/v8p2r0:batch (/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \ --use-link --workdirectory $(pwd)/test_run \ --jobname telemac_test \ --ncsize 2 \ --nctile 2 \ --ncnode 1 \ --walltime 1-00:00:00 \ --queue standard \ telemac2d -s t2d_bump_FE.cas