====== Building and using TELEMAC-MASCARET with VALET integration ======
TELEMAC-MASCARET is an integrated suite of solvers for use in the field of free-surface flow. Having been used in the context of many studies throughout the world, it has become one of the major standards in its field.
The text above appears at the top of the [[http://www.opentelemac.org|TELEMAC-MASCARET]] web site. In all that time this product was becoming //one of the major standards in its field// they don't seem to have made it that easy to setup, build, or actually use. The official installation documentation has some distinctly problematic errors in it with respect to deployment on HPC systems.
This document describes a tested setup and build procedure for TELEMAC-MASCARET (to be referred to as just //TELEMAC// henceforth) that manages the software with the VALET environment tool -- though the same procedures should likely work just as well with Modules, LMod, or any of the other environment-management systems used on HPC systems.
====== Build ======
Before starting, make sure you are in your **[[abstract:caviness:app_dev:compute_env#using-workgroup-and-directories|workgroup]]** (e.g. ''workgroup -g </investing-entity//>>'') as these instructions depend on environment variables set when you use this command.
Please note that the procedures herein will work perfectly well if **not** installing in your workgroup's storage. The ''$WORKDIR'' base directory could just as easily be ''$HOME'' to install in the user's home directory.
Various versions of TELEMAC will coexist under a single parent directory. Using the recommended approach to software management on our HPC systems, a ''telemac'' directory will be created in the standard workgroup software directory:
$ mkdir -p "$WORKDIR/sw/telemac"
$ TELEMAC_PREFIX="$WORKDIR/sw/telemac/v8p2r0"
When this procedure was written, the newest tagged-release of TELEMAC was v8p2r0, so the version being built will be given the version id ''v8p2r0'' -- meaning it will be installed in ''$WORKDIR/sw/telemac/v8p2r0''. The ''TELEMAC_PREFIX'' environment variable is used throughout the build process to avoid retyping that path multiple times (but is not needed when running TELEMAC programs).
===== Python setup =====
The latest versions of TELEMAC make heavy use of Python scripts. There are various Python packages required by different components, but at the most basic level numpy, scipy, and matplotlib are necessary. The Python requirements will be satisfied using a //virtual environment//.
$ vpkg_require intel-python/2020u2:python3
(base) $ conda create --prefix="$TELEMAC_PREFIX" python=3 numpy=1.17 scipy=1.3 pip matplotlib=3
Ignore the instructions regarding activating the new environment. Note also that the prompt has changed to include the prefix ''(base)'', indicating a Python virtual environment is active -- the base environment that comprises the Intel Python package.
===== Download source =====
The [[http://wiki.opentelemac.org/doku.php?id=svn_source_code_repository_for_telemac#checking-out_from_the_telemac_repository|official Subversion TELEMAC source repository]] is enormous: the ''documentation'', ''examples'', and ''notebooks'' subdirectories account for most of this, and are not required. Especially when building multiple variants of a TELEMAC release (e.g. different compilers, code changes) it's a good idea to avoid duplicating those large subdirectories among each source instance. To that end, the procedure that follows avoids checking-out those (massive!) subdirectories.
(base) $ cd "$TELEMAC_PREFIX"
(base) $ svn co http://svn.opentelemac.org/svn/opentelemac/tags/v8p2r0 --username=ot-svn-public --password='telemac1*' --non-interactive --depth=immediates src
(base) $ cd src
(base) $ for item in builds configs scripts sources; do \
svn co http://svn.opentelemac.org/svn/opentelemac/tags/v8p2r0/${item} --username=ot-svn-public --password='telemac1*' --non-interactive --depth=infinity
done
===== Build METIS =====
The METIS source must be downloaded, built, and installed as part of TELEMAC.
(base) $ cd "$TELEMAC_PREFIX/src/optionals"
(base) $ wget 'http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz'
(base) $ tar -xf metis-5.1.0.tar.gz
(base) $ cd metis-5.1.0
The only pre-requisite for METIS is the compiler that will be used to build the rest of TELEMAC. The TELEMAC build will include Open MPI parallelism with the Intel compiler suite underlying it:
(base) $ vpkg_require openmpi/4.0.2:intel
METIS 5.1.0 has a CMake build system present:
(base) $ mkdir build-20210127 ; cd build-20210127
(base) $ CC=icc CXX=icpc FC=ifort cmake -DCMAKE_INSTALL_PREFIX="$TELEMAC_PREFIX" -DCMAKE_BUILD_TYPE=Release -DSHARED=TRUE -DGKLIB_PATH=$(pwd)/../GKlib ..
(base) $ make -j 4
(base) $ make install
With METIS installed to the base directory (''$TELEMAC_PREFIX'') there's a single parent directory for //all// dependencies installed as a part of this TELEMAC version. In other words, referencing ''$TELEMAC_PREFIX/include'' for headers and ''$TELEMAC_PREFIX/lib'' for shared libraries is adequate. This installation prescription should be applied to all dependencies integrated with the build and not provided externally (e.g. AED2, MED, MUMPS).
===== TELEMAC config =====
The next step is **critical** to the build and usage of this copy of TELEMAC. The ''systel.cfg'' file is an INI-style file that describes the compilers, flags, options, et al. associated with one or more //builds// of TELEMAC inside this source tree.
The config file presented here contains two configurations: //batch// and //interactive//. They will both end up referring to the same (single) build of the TELEMAC libraries and executables, with the major difference being //batch// executes all programs via the Slurm scheduler and //interactive// runs programs in the current shell. First, an apt location for the ''systel.cfg'' file is needed:
(base) $ mkdir "${TELEMAC_PREFIX}/etc"
(base) $ cat < "${TELEMAC_PREFIX}/etc/systel.cfg"
##
## Declare what named configurations are present
##
[Configurations]
configs: interactive batch
##
## Baseline options that named configurations can override
##
[general]
modules: system
version: v8p2
options: dyn mpi
hash_char: #
# language: 1=French, 2=English
language: 2
mods_all: -I
sfx_zip: .tar.gz
sfx_lib: .so
sfx_obj: .o
sfx_mod: .mod
sfx_exe:
val_root: /examples
# possible val_rank: all <3 >7 6
val_rank: all
# C compiler and flags:
cc: icc
cflags: -fPIC -xHost -O3
# Fortran compiler and flags:
fc: mpifort
fflags: -fpp -fPIC -convert big_endian -xHost -O3 -DHAVE_MPI
# command to compile C source to a object code:
cmd_obj_c: [cc] [cflags] -c -o -fPIC
cmd_obj: [fc] [fflags] -c
cmd_lib: [fc] [fflags] -shared -o
cmd_exe: [fc] [fflags] -o
# METIS:
inc_metis: -I\$METISHOME/include
libs_metis: -L\$METISHOME/lib -lmetis
# Overall library flags:
incs_all: [inc_metis]
libs_all: [libs_metis]
##
## Deviations from [general] for our configurations:
##
[interactive]
brief: interactive mode for compilation, etc.
[batch]
brief: job submission mode
options: hpc
# Running the "split" parts of the work:
par_cmdexec: /partel < >>
# Running an MPI program:
mpi_cmdexec: mpirun
# HPC job template:
hpc_stdin: [hash_char]!/bin/bash -i
[hash_char]SBATCH --job-name=
[hash_char]SBATCH --output=-
One major deviation should be evident in the ''hpc_stdin'' shown here versus on the TELEMAC web site. Python's configparser.RawConfigParser class handles multiline strings by //removing comment lines//. Since every key **must** have a value, if the first line starts with a ''#'' it must **not** be considered a comment. So:
hpc_stdin: #!/bin/bash
#SBATCH --option1
#SBATCH --option2
#SBATCH --option3
mpirun help-me
is parsed as:
hpc_stdin = "#!/bin/bash\nmpirun help-me"
In other words, the batch script template cannot have comments, thus it can have no embedded arguments for the majority of schedulers (PBS, Slurm, SGE). The solution to this problem is to reference a variable that will be replaced with the ''#'' character after parsing. Thus:
[general]
:
hash_char: #
:
hpc_stdin: [hash_char]!/bin/bash
[hash_char]SBATCH --option1
[hash_char]SBATCH --option2
[hash_char]SBATCH --option3
mpirun help-me
which after parsing will be
hpc_stdin = "#!/bin/bash\n#SBATCH --option1\n#SBATCH --option2\n#SBATCH --option2\nmpirun help-me"
Hooray!
===== VALET package definition =====
Representing this version of TELEMAC in VALET requires several transformations to the environment:
- Addition of ''$TELEMAC_PREFIX/lib'' for library search by the dynamic linker
- Activation of the Python virtual environment
- Setting of the TELEMAC environment variables (e.g. ''SYSTELCFG'', ''USETELCFG'')
- Addition of TELEMAC Python library to the ''PYTHONPATH''
The following command creates a new package definition file for this version of TELEMAC; if the ''telemac.vpkg_yaml'' file already exists and a new version is being added, edit the file to add the new version blocks rather than using the following command (it will overwrite the file).
If the installation is not meant to be available to all workgroup users (e.g. installed to ''$HOME'' rather than ''$WORKDIR'') the VALET package definition should be in ''~/.valet/telemac.vpkg_yaml''.
(base) $ mkdir --mode=2770 --parents "$WORKDIR/sw/valet"
(base) $ cat < "$WORKDIR/sw/valet/telemac.vpkg_yaml"
telemac:
prefix: $(dirname "${TELEMAC_PREFIX}")
actions:
- action: source
script:
sh: anaconda-activate.sh
order: failure-first
success: 0
- bindir: \${VALET_PATH_PREFIX}/src/scripts/python3
- variable: HOMETEL
value: \${VALET_PATH_PREFIX}/src
action: set
- variable: SYSTELCFG
value: \${VALET_PATH_PREFIX}/etc/systel.cfg
action: set
- variable: PYTHONUNBUFFERED
value: true
action: set
- variable: PYTHONPATH
value: \${VALET_PATH_PREFIX}/src/scripts/python3
action: prepend-path
- variable: METISHOME
value: \${VALET_PATH_PREFIX}
action: set
versions:
"v8p2r0:interactive":
prefix: v8p2r0
dependencies:
- openmpi/4.0.2:intel
- intel-python/2020u2:python3
actions:
- variable: USETELCFG
value: interactive
action: set
- libdir: \${VALET_PATH_PREFIX}/src/builds/interactive/wrap_api/lib
- variable: PYTHONPATH
value: \${VALET_PATH_PREFIX}/src/builds/interactive/wrap_api/lib
action: prepend-path
"v8p2r0:batch":
prefix: v8p2r0
dependencies:
- openmpi/4.0.2:intel
- intel-python/2020u2:python3
actions:
- variable: USETELCFG
value: batch
action: set
- libdir: \${VALET_PATH_PREFIX}/src/builds/batch/wrap_api/lib
- variable: PYTHONPATH
value: \${VALET_PATH_PREFIX}/src/builds/batch/wrap_api/lib
action: prepend-path
EOT
Two variants of ''v8p2r0'' are defined in the package, corresponding to the //interactive// and //batch// configurations in ''systel.cfg''. The Intel Python and Open MPI packages are properly-mentioned as dependencies, the virtual environment is automatically activated, and all necessary changes to environment variables are present.
===== Build TELEMAC =====
Finally, the TELEMAC libraries and executables can be built! First, the environment must be prepared by rolling-back to the original shell then adding our new ''telemac/v8p2r0:batch'' package:
(base) $ vpkg_rollback all
$ vpkg_require telemac/v8p2r0:batch
Adding dependency `intel/2019u5` to your environment
Adding dependency `libfabric/1.9.0` to your environment
Adding dependency `openmpi/4.0.2:intel` to your environment
Adding dependency `intel-python/2020u2:python3` to your environment
Adding package `telemac/v8p2r0:batch` to your environment
(/work/workgroup/sw/telemac/v8p2r0) $
Note that the prompt has changed again, this time showing that the TELEMAC virtual environment created with the ''conda'' command above is active.
The TELEMAC software is built with two commands:
(/work/workgroup/sw/telemac/v8p2r0) $ config.py
(/work/workgroup/sw/telemac/v8p2r0) $ compile_telemac.py -j 4
The compile may take quite some time. If the commands are successful, then the //interactive// build is created as a symbolic link to the //batch// build:
(/work/workgroup/sw/telemac/v8p2r0) $ ln -s batch "$HOMETEL/builds/interactive"
Think of the //interactive// build as a "light" copy of or an alias to the //batch// build.
Any recompilation of the TELEMAC libraries and executables should be done using the //batch// build.
===== All done! =====
That's it. TELEMAC is now built, easily added to a shell using VALET, and ready for work.
====== Test job ======
The following test job was furnished by a Caviness user. It uses a set of three Fortran source files implementing a user model.
A job comprises four distinct steps:
- Partition the problem according to the parallelism options
* Number of nodes
* Number of MPI processes per node
- Compile user sources into an executable
* This step may (must) be done of the login node serially
- Run the executable
* Same MPI parameters as in step 1
- Merge the partitioned results
* Same MPI parameters as in step 1
We set up a test job directory called ''$WORKDIR/sw/telemac/example'' with our steering file, input files, and source code in this directory. The job will use a working directory //inside// our test job directory for all of the intermediate files during the run. Also the ''standard'' partition is specified as the queue for the batch steps (those that can run on compute nodes). Remember jobs can be preempted when using the ''standard'' partition, so you may want to use your workgroup partition for longer running jobs.
===== Partitioning =====
Partitioning is done in parallel, and thus should use a compute node and the job scheduler. This implies the ''telemac/v8p2r0:batch'' package in the environment:
$ cd "$WORKDIR/sw/telemac/example"
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
--use-link --split --workdirectory $(pwd)/test_run \
--jobname telemac_test-split \
--ncsize 2 \
--nctile 2 \
--ncnode 1 \
--walltime 1-00:00:00 \
--queue standard \
telemac2d -s t2d_bump_FE.cas
A job will be submitted to the job scheduler and the grid et al. will be partitioned into two (2) domains -- since the //ncsize// = //ncnode// * //nctile// is 2((Be sure to get those three flags correct, otherwise TELEMAC may choose unexpected numbers.)).
===== Compilation =====
Once the problem is partitioned, the executable can be built. This step should not be run on compute nodes since they lack a base development environment in the OS((This step could be run on a node in the **devel** partition on Caviness. DARWIN nodes all have a full base development environment in the OS, so the jobs can be run in a single step -- see the "[[technical:recipes:telemac#single-step-runs|Single-step runs]]" section detailing that mode of operation.)).
(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback
$ vpkg_require telemac/v8p2r0:interactive
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
--use-link --compileonly --workdirectory $(pwd)/test_run \
telemac2d -s t2d_bump_FE.cas
===== Run =====
With the problem partitioned and the executable built successfully, the job can be run. Again, this step uses the //batch// build.
(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
--use-link --run --workdirectory $(pwd)/test_run \
--jobname telemac_test-run \
--ncsize 2 \
--nctile 2 \
--ncnode 1 \
--walltime 1-00:00:00 \
--queue standard \
telemac2d -s t2d_bump_FE.cas
===== Merge =====
If the run is successful, the partitioned results must be merged. This step also uses the //batch// build.
(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
--use-link --merge --workdirectory $(pwd)/test_run \
--jobname telemac_test-merge \
--ncsize 2 \
--nctile 2 \
--ncnode 1 \
--walltime 1-00:00:00 \
--queue standard \
telemac2d -s t2d_bump_FE.cas
===== Single-step runs =====
Clusters that install a full base development environment on all compute nodes do not need to split the run detailed above into 4 steps. Thus, the following is probably permissible:
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
--use-link --workdirectory $(pwd)/test_run \
--jobname telemac_test \
--ncsize 2 \
--nctile 2 \
--ncnode 1 \
--walltime 1-00:00:00 \
--queue standard \
telemac2d -s t2d_bump_FE.cas