Table of Contents

Building and using TELEMAC-MASCARET with VALET integration

TELEMAC-MASCARET is an integrated suite of solvers for use in the field of free-surface flow. Having been used in the context of many studies throughout the world, it has become one of the major standards in its field.

The text above appears at the top of the TELEMAC-MASCARET web site. In all that time this product was becoming one of the major standards in its field they don't seem to have made it that easy to setup, build, or actually use. The official installation documentation has some distinctly problematic errors in it with respect to deployment on HPC systems.

This document describes a tested setup and build procedure for TELEMAC-MASCARET (to be referred to as just TELEMAC henceforth) that manages the software with the VALET environment tool – though the same procedures should likely work just as well with Modules, LMod, or any of the other environment-management systems used on HPC systems.

Build

Before starting, make sure you are in your workgroup (e.g. workgroup -g «investing-entity») as these instructions depend on environment variables set when you use this command.

Please note that the procedures herein will work perfectly well if not installing in your workgroup's storage. The $WORKDIR base directory could just as easily be $HOME to install in the user's home directory.

Various versions of TELEMAC will coexist under a single parent directory. Using the recommended approach to software management on our HPC systems, a telemac directory will be created in the standard workgroup software directory:

$ mkdir -p "$WORKDIR/sw/telemac"
$ TELEMAC_PREFIX="$WORKDIR/sw/telemac/v8p2r0"

When this procedure was written, the newest tagged-release of TELEMAC was v8p2r0, so the version being built will be given the version id v8p2r0 – meaning it will be installed in $WORKDIR/sw/telemac/v8p2r0. The TELEMAC_PREFIX environment variable is used throughout the build process to avoid retyping that path multiple times (but is not needed when running TELEMAC programs).

Python setup

The latest versions of TELEMAC make heavy use of Python scripts. There are various Python packages required by different components, but at the most basic level numpy, scipy, and matplotlib are necessary. The Python requirements will be satisfied using a virtual environment.

$ vpkg_require intel-python/2020u2:python3
(base) $ conda create --prefix="$TELEMAC_PREFIX" python=3 numpy=1.17 scipy=1.3 pip matplotlib=3

Ignore the instructions regarding activating the new environment. Note also that the prompt has changed to include the prefix (base), indicating a Python virtual environment is active – the base environment that comprises the Intel Python package.

Download source

The official Subversion TELEMAC source repository is enormous: the documentation, examples, and notebooks subdirectories account for most of this, and are not required. Especially when building multiple variants of a TELEMAC release (e.g. different compilers, code changes) it's a good idea to avoid duplicating those large subdirectories among each source instance. To that end, the procedure that follows avoids checking-out those (massive!) subdirectories.

(base) $ cd "$TELEMAC_PREFIX"
(base) $ svn co http://svn.opentelemac.org/svn/opentelemac/tags/v8p2r0 --username=ot-svn-public --password='telemac1*' --non-interactive --depth=immediates src
(base) $ cd src
(base) $ for item in builds configs scripts sources; do \
         svn co http://svn.opentelemac.org/svn/opentelemac/tags/v8p2r0/${item} --username=ot-svn-public --password='telemac1*' --non-interactive --depth=infinity
         done

Build METIS

The METIS source must be downloaded, built, and installed as part of TELEMAC.

(base) $ cd "$TELEMAC_PREFIX/src/optionals"
(base) $ wget 'http://glaros.dtc.umn.edu/gkhome/fetch/sw/metis/metis-5.1.0.tar.gz'
(base) $ tar -xf metis-5.1.0.tar.gz
(base) $ cd metis-5.1.0

The only pre-requisite for METIS is the compiler that will be used to build the rest of TELEMAC. The TELEMAC build will include Open MPI parallelism with the Intel compiler suite underlying it:

(base) $ vpkg_require openmpi/4.0.2:intel

METIS 5.1.0 has a CMake build system present:

(base) $ mkdir build-20210127 ; cd build-20210127
(base) $ CC=icc CXX=icpc FC=ifort cmake -DCMAKE_INSTALL_PREFIX="$TELEMAC_PREFIX" -DCMAKE_BUILD_TYPE=Release -DSHARED=TRUE -DGKLIB_PATH=$(pwd)/../GKlib ..
(base) $ make -j 4
(base) $ make install

With METIS installed to the base directory ($TELEMAC_PREFIX) there's a single parent directory for all dependencies installed as a part of this TELEMAC version. In other words, referencing $TELEMAC_PREFIX/include for headers and $TELEMAC_PREFIX/lib for shared libraries is adequate. This installation prescription should be applied to all dependencies integrated with the build and not provided externally (e.g. AED2, MED, MUMPS).

TELEMAC config

The next step is critical to the build and usage of this copy of TELEMAC. The systel.cfg file is an INI-style file that describes the compilers, flags, options, et al. associated with one or more builds of TELEMAC inside this source tree.

The config file presented here contains two configurations: batch and interactive. They will both end up referring to the same (single) build of the TELEMAC libraries and executables, with the major difference being batch executes all programs via the Slurm scheduler and interactive runs programs in the current shell. First, an apt location for the systel.cfg file is needed:

(base) $ mkdir "${TELEMAC_PREFIX}/etc"
(base) $ cat <<EOT > "${TELEMAC_PREFIX}/etc/systel.cfg"
##
## Declare what named configurations are present
##
[Configurations]
configs: interactive batch
 
##
## Baseline options that named configurations can override
##
[general]
modules:       system
version:       v8p2
options:       dyn mpi
 
hash_char:     #
 
# language:  1=French, 2=English
language:      2
 
mods_all:      -I <config>
 
sfx_zip:       .tar.gz
sfx_lib:       .so
sfx_obj:       .o
sfx_mod:       .mod
sfx_exe:
 
val_root:      <root>/examples
 
# possible val_rank:   all <3 >7 6
val_rank:      all
 
# C compiler and flags:
cc:            icc
cflags:        -fPIC -xHost -O3
 
# Fortran compiler and flags:
fc:            mpifort
fflags:        -fpp -fPIC -convert big_endian -xHost -O3 -DHAVE_MPI
 
# command to compile C source to a object code:
cmd_obj_c:     [cc] [cflags] -c <srcName> -o <objName> -fPIC
cmd_obj:       [fc] [fflags] -c <mods> <incs> <f95name>
cmd_lib:       [fc] [fflags] -shared -o <libname> <objs>
cmd_exe:       [fc] [fflags] -o <exename> <objs> <libs>
 
# METIS:
inc_metis:     -I\$METISHOME/include
libs_metis:    -L\$METISHOME/lib -lmetis
 
# Overall library flags:
incs_all:      [inc_metis]
libs_all:      [libs_metis]
 
##
## Deviations from [general] for our configurations:
##
[interactive]
brief:         interactive mode for compilation, etc.
 
[batch]
brief:         job submission mode
options:       hpc
 
# Running the "split" parts of the work:
par_cmdexec:   <config>/partel < <partel.par> >> <partel.log>
 
# Running an MPI program:
mpi_cmdexec:   mpirun <exename>
 
# HPC job template:
hpc_stdin: [hash_char]!/bin/bash -i
    [hash_char]SBATCH --job-name=<jobname>
    [hash_char]SBATCH --output=<jobname>-<time>.out
    [hash_char]SBATCH --error=<jobname>-<time>.err
    [hash_char]SBATCH --time=<walltime>
    [hash_char]SBATCH --nodes=<ncnode>
    [hash_char]SBATCH --ntasks-per-node=<nctile>
    [hash_char]SBATCH --partition=<queue>
    [hash_char]SBATCH --export=NONE
    vpkg_require telemac/v8p2r0:batch
    . /opt/shared/slurm/templates/libexec/openmpi.sh
    <py_runcode>
    mpi_rc=\$?
    exit \$?
 
# HPC job submission:
hpc_runcode:   cp HPC_STDIN ../;cd ../;sbatch < <hpc_stdin>
 
EOT

One major deviation should be evident in the hpc_stdin shown here versus on the TELEMAC web site. Python's configparser.RawConfigParser class handles multiline strings by removing comment lines. Since every key must have a value, if the first line starts with a # it must not be considered a comment. So:

hpc_stdin:  #!/bin/bash
    #SBATCH --option1
    #SBATCH --option2
    #SBATCH --option3
    mpirun help-me

is parsed as:

hpc_stdin = "#!/bin/bash\nmpirun help-me"

In other words, the batch script template cannot have comments, thus it can have no embedded arguments for the majority of schedulers (PBS, Slurm, SGE). The solution to this problem is to reference a variable that will be replaced with the # character after parsing. Thus:

[general]
  :
hash_char:  #
  :
hpc_stdin:  [hash_char]!/bin/bash
    [hash_char]SBATCH --option1
    [hash_char]SBATCH --option2
    [hash_char]SBATCH --option3
    mpirun help-me

which after parsing will be

hpc_stdin = "#!/bin/bash\n#SBATCH --option1\n#SBATCH --option2\n#SBATCH --option2\nmpirun help-me"

Hooray!

VALET package definition

Representing this version of TELEMAC in VALET requires several transformations to the environment:

  1. Addition of $TELEMAC_PREFIX/lib for library search by the dynamic linker
  2. Activation of the Python virtual environment
  3. Setting of the TELEMAC environment variables (e.g. SYSTELCFG, USETELCFG)
  4. Addition of TELEMAC Python library to the PYTHONPATH

The following command creates a new package definition file for this version of TELEMAC; if the telemac.vpkg_yaml file already exists and a new version is being added, edit the file to add the new version blocks rather than using the following command (it will overwrite the file).

If the installation is not meant to be available to all workgroup users (e.g. installed to $HOME rather than $WORKDIR) the VALET package definition should be in ~/.valet/telemac.vpkg_yaml.

(base) $ mkdir --mode=2770 --parents "$WORKDIR/sw/valet"
(base) $ cat <<EOT > "$WORKDIR/sw/valet/telemac.vpkg_yaml"
telemac:
    prefix: $(dirname "${TELEMAC_PREFIX}")
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          order: failure-first
          success: 0
        - bindir: \${VALET_PATH_PREFIX}/src/scripts/python3
        - variable: HOMETEL
          value: \${VALET_PATH_PREFIX}/src
          action: set
        - variable: SYSTELCFG
          value: \${VALET_PATH_PREFIX}/etc/systel.cfg
          action: set
        - variable: PYTHONUNBUFFERED
          value: true
          action: set
        - variable: PYTHONPATH
          value: \${VALET_PATH_PREFIX}/src/scripts/python3
          action: prepend-path
        - variable: METISHOME
          value: \${VALET_PATH_PREFIX}
          action: set
    versions:
        "v8p2r0:interactive":
            prefix: v8p2r0
            dependencies:
                - openmpi/4.0.2:intel
                - intel-python/2020u2:python3
            actions:
                - variable: USETELCFG
                  value: interactive
                  action: set
                - libdir: \${VALET_PATH_PREFIX}/src/builds/interactive/wrap_api/lib
                - variable: PYTHONPATH
                  value: \${VALET_PATH_PREFIX}/src/builds/interactive/wrap_api/lib
                  action: prepend-path
        "v8p2r0:batch":
            prefix: v8p2r0
            dependencies:
                - openmpi/4.0.2:intel
                - intel-python/2020u2:python3
            actions:
                - variable: USETELCFG
                  value: batch
                  action: set
                - libdir: \${VALET_PATH_PREFIX}/src/builds/batch/wrap_api/lib
                - variable: PYTHONPATH
                  value: \${VALET_PATH_PREFIX}/src/builds/batch/wrap_api/lib
                  action: prepend-path
 
EOT

Two variants of v8p2r0 are defined in the package, corresponding to the interactive and batch configurations in systel.cfg. The Intel Python and Open MPI packages are properly-mentioned as dependencies, the virtual environment is automatically activated, and all necessary changes to environment variables are present.

Build TELEMAC

Finally, the TELEMAC libraries and executables can be built! First, the environment must be prepared by rolling-back to the original shell then adding our new telemac/v8p2r0:batch package:

(base) $ vpkg_rollback all
$ vpkg_require telemac/v8p2r0:batch
Adding dependency `intel/2019u5` to your environment
Adding dependency `libfabric/1.9.0` to your environment
Adding dependency `openmpi/4.0.2:intel` to your environment
Adding dependency `intel-python/2020u2:python3` to your environment
Adding package `telemac/v8p2r0:batch` to your environment
(/work/workgroup/sw/telemac/v8p2r0) $

Note that the prompt has changed again, this time showing that the TELEMAC virtual environment created with the conda command above is active.

The TELEMAC software is built with two commands:

(/work/workgroup/sw/telemac/v8p2r0) $ config.py
(/work/workgroup/sw/telemac/v8p2r0) $ compile_telemac.py -j 4

The compile may take quite some time. If the commands are successful, then the interactive build is created as a symbolic link to the batch build:

(/work/workgroup/sw/telemac/v8p2r0) $ ln -s batch "$HOMETEL/builds/interactive"

Think of the interactive build as a "light" copy of or an alias to the batch build.

Any recompilation of the TELEMAC libraries and executables should be done using the batch build.

All done!

That's it. TELEMAC is now built, easily added to a shell using VALET, and ready for work.

Test job

The following test job was furnished by a Caviness user. It uses a set of three Fortran source files implementing a user model.

A job comprises four distinct steps:

  1. Partition the problem according to the parallelism options
    • Number of nodes
    • Number of MPI processes per node
  2. Compile user sources into an executable
    • This step may (must) be done of the login node serially
  3. Run the executable
    • Same MPI parameters as in step 1
  4. Merge the partitioned results
    • Same MPI parameters as in step 1

We set up a test job directory called $WORKDIR/sw/telemac/example with our steering file, input files, and source code in this directory. The job will use a working directory inside our test job directory for all of the intermediate files during the run. Also the standard partition is specified as the queue for the batch steps (those that can run on compute nodes). Remember jobs can be preempted when using the standard partition, so you may want to use your workgroup partition for longer running jobs.

Partitioning

Partitioning is done in parallel, and thus should use a compute node and the job scheduler. This implies the telemac/v8p2r0:batch package in the environment:

$ cd "$WORKDIR/sw/telemac/example"
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
    --use-link --split --workdirectory $(pwd)/test_run \
      --jobname telemac_test-split \
      --ncsize 2 \
      --nctile 2 \
      --ncnode 1 \
      --walltime 1-00:00:00 \
      --queue standard \
    telemac2d -s t2d_bump_FE.cas

A job will be submitted to the job scheduler and the grid et al. will be partitioned into two (2) domains – since the ncsize = ncnode * nctile is 21).

Compilation

Once the problem is partitioned, the executable can be built. This step should not be run on compute nodes since they lack a base development environment in the OS2).

(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback
$ vpkg_require telemac/v8p2r0:interactive
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
    --use-link --compileonly --workdirectory $(pwd)/test_run \
    telemac2d -s t2d_bump_FE.cas

Run

With the problem partitioned and the executable built successfully, the job can be run. Again, this step uses the batch build.

(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
      --use-link --run --workdirectory $(pwd)/test_run \
      --jobname telemac_test-run \
      --ncsize 2 \
      --nctile 2 \
      --ncnode 1 \
      --walltime 1-00:00:00 \
      --queue standard \
    telemac2d -s t2d_bump_FE.cas

Merge

If the run is successful, the partitioned results must be merged. This step also uses the batch build.

(/work/workgroup/sw/telemac/v8p2r0) $ vpkg_rollback
$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
    --use-link --merge --workdirectory $(pwd)/test_run \
      --jobname telemac_test-merge \
      --ncsize 2 \
      --nctile 2 \
      --ncnode 1 \
      --walltime 1-00:00:00 \
      --queue standard \
    telemac2d -s t2d_bump_FE.cas

Single-step runs

Clusters that install a full base development environment on all compute nodes do not need to split the run detailed above into 4 steps. Thus, the following is probably permissible:

$ vpkg_require telemac/v8p2r0:batch
(/work/workgroup/sw/telemac/v8p2r0) $ runcode.py \
    --use-link --workdirectory $(pwd)/test_run \
      --jobname telemac_test \
      --ncsize 2 \
      --nctile 2 \
      --ncnode 1 \
      --walltime 1-00:00:00 \
      --queue standard \
    telemac2d -s t2d_bump_FE.cas
1)
Be sure to get those three flags correct, otherwise TELEMAC may choose unexpected numbers.
2)
This step could be run on a node in the devel partition on Caviness. DARWIN nodes all have a full base development environment in the OS, so the jobs can be run in a single step – see the "Single-step runs" section detailing that mode of operation.