technical:recipes:mpi4py-in-virtualenv

Python Virtual Environments with mpi4py

Most conda channels include copies of the mpi4py module to satisfy dependencies of MPI-parallelized packages. But the mpi4py Python code must be built on top of a native MPI library (like MPICH, Open MPI, Intel MPI). As a result, the conda packages always include a bundled binary MPI library that was built to generic specifications: often without support for Infiniband communications or Slurm/Grid Engine integration support. For proper functioning it's recommended that mpi4py always be built on top of one of the MPI libraries IT-RCI provides on a cluster.

In this example we will build the virtual environment on Farber using the openmpi/4.0.5 version of Open MPI and Anaconda for the virtual environment:

$ vpkg_require openmpi/4.0.5 anaconda/5.2.0:python3
Adding dependency `ucx/1.9.0` to your environment
Adding package `openmpi/4.0.5` to your environment
Adding package `anaconda/5.2.0:python3` to your environment

On Caviness and DARWIN we would likely choose the Intel distribution selecting a Python 3 version over Anaconda (e.g. vpkg_versions intel-oneapi or intel-python) since it automatically enables Intel's distribution channel. That channel includes Numpy built against the Intel MKL library, for example, and other highly-optimized variants of computationally-intensive Python components.

As of November 2020, the majority of packages populating the Intel channel require baseline operating system libraries (like glibc) newer than what Farber provides: a clear example of the binary compatibility issues that are present in conda software distribution.

We will be creating a Python virtual environment containing Numpy and Scipy libraries into which mpi4py will be added. In case we will need to create additional similar environments in the future, we will setup a directory hierarchy that allows multiple versions to coexist:

$ mkdir -p ${HOME}/conda-envs/my-sci-app/20201102

Two things to note:

  • As written the directory hierarchy is created in the user's home directory; ${HOME} could be replaced by ${WORKDIR}/users/myname, for example, to create it elsewhere.
  • The current date is used as a version identifier; using the format YYYYMMDD promotes simple sorting of the versions from oldest to newest.

The directory structure will lend my-sci-app to straightforward management using VALET.

The virtual environment is first populated with all packages that do not require mpi4py. Any packages requiring mpi4py must be installed after we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py.

The two channel options are present to ensure only the default Anaconda channels are consulted – otherwise the command could still pick packages from the Intel channel, for example, which would still have the binary compatibility issues!

$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy
Solving environment: done
    :
Proceed ([y]/n)? y
    :
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
#
# To activate this environment, use:
# > source activate /home/1001/conda-envs/my-sci-app/20201102
#
# To deactivate an active environment, use:
# > source deactivate
#

Before building and installing mpi4py the environment needs to be activated:

$ source activate /home/1001/conda-envs/my-sci-app/20201102
(/home/1001/conda-envs/my-sci-app/20201102)$ 

With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.

(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py
Collecting mpi4py
  Using cached mpi4py-3.0.3.tar.gz (1.4 MB)
Skipping wheel build for mpi4py, due to binaries being disabled for it.
Installing collected packages: mpi4py
    Running setup.py install for mpi4py ... done
Successfully installed mpi4py-3.0.3

The –no-binary :all: flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The –compile flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the openmpi/4.0.5 library on Farber:

(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
mpi4py      3.0.3

Additional packages that require mpi4py can now be installed into the environment.

The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present:

$ mkdir -p ${HOME}/.valet
$ echo ${HOME}/conda-envs/my-sci-app
/home/1001/conda-envs/my-sci-app

Take note of the path echoed, then create a new file named ${HOME}/.valet/my-sci-app.vpkg_json and add the following text to it:

{   "my-sci-app": {
        "prefix": "/home/1001/conda-envs/my-sci-app",
        "description": "Some scientific app project in Python",
        "standard-paths": false,
        "actions": [
            { "action": "source", "order": "failure-first", "success": 0,
              "script": { "sh": "anaconda-activate.sh" }
            }
        ],
        "versions": {
            "20201102": {
                "description": "environment built Nov 2, 2020",
                "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ]
            }
        }
    }
}

Please note:

  1. The prefix path will be different for you
  2. We do not need to tell VALET the full path to each version; the version identifier is the subdirectory or prefix containing that version
  3. If you choose a different version of Open MPI or Anaconda, alter the dependencies list accordingly
  4. New versions of this project are appended to the versions dictionary:
            "versions": {
                "20201102": {
                    "description": "environment built Nov 2, 2020",
                    "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ]
                },
                "20201114": {
                    "description": "environment built Nov 14, 2020",
                    "dependencies": [ "openmpi/3.1.6", "anaconda/5.2.0:python3" ]
                }
            }

Using the Virtual Environment

The versions of the virtual environment declared in the VALET package are listed using the vpkg_versions command:

$ vpkg_versions my-sci-app
Available versions in package (* = default version):
 
[/home/1001/.valet/my-sci-app.vpkg_json]
my-sci-app  Some scientific app project in Python
* 20201102  environment built Nov 2, 2020

Activating the virtual environment is accomplished using the vpkg_require command (in your login shell or inside job scripts):

$ vpkg_require my-sci-app/20201102
Adding dependency `ucx/1.9.0` to your environment
Adding dependency `openmpi/4.0.5` to your environment
Adding dependency `anaconda/5.2.0:python3` to your environment
Adding package `my-sci-app/20201102` to your environment
(/home/1001/conda-envs/my-sci-app/20201102)$ which python3
~/conda-envs/my-sci-app/20201102/bin/python3
(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
mpi4py      3.0.3
$ which mpirun
/opt/shared/openmpi/4.0.5/bin/mpirun

The steps for completing this work on Caviness are similar to those presented for Farber and of course following the first part to create a directory hierarchy. We will instead use the Intel Python distribution:

$ vpkg_require openmpi/4.1.4:gcc-12.1.0 anaconda/2024.02
Adding dependency `libfabric/1.13.2` to your environment
Adding dependency `binutils/2.35` to your environment
Adding dependency `gcc/12.1.0` to your environment
Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment
Adding package `anaconda/2024.02` to your environment

The virtual environment is first populated with all packages that do not require mpi4py. Any packages requiring mpi4py must be installed after we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py.

$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 python'=>3.7' numpy scipy
Collecting package metadata (current_repodata.json): done
Solving environment: done
    :
Proceed ([y]/n)? y
    :
#
# To activate this environment, use
#
#     $ conda activate /home/1001/conda-envs/my-sci-app/20201102
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Before building and installing mpi4py the environment needs to be activated:

$ conda activate /home/1001/conda-envs/my-sci-app/20201102
(/home/1001/conda-envs/my-sci-app/20201102)$ 

With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.

(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py
Collecting mpi4py
  Using cached mpi4py-4.0.1.tar.gz (466 kB)
Skipping wheel build for mpi4py, due to binaries being disabled for it.
Installing collected packages: mpi4py
    Running setup.py install for mpi4py ... done
Successfully installed mpi4py-4.0.1

However, you may experience an error during the compile and will indicate it failed showing the following

/opt/shared/openmpi/4.1.4-gcc-12.1.0/bin/mpicc -pthread -B /home/1001/conda-envs/my-sci-apps/20201102/compiler_compat _configtest.o -o _configtest

because the specification of -B is looking for a version of ld as part of the environment in lieu of the system ld. If this should happen, then you will need to change the permissions to allow it to work and retry the compile again.

(/home/1001/conda-envs/my-sci-app/20201102)$ chmod 000 /home/1001/conda-envs/my-sci-app/202011029/compiler_compat/ld
(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py

The –no-binary :all: flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The –compile flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the openmpi/4.1.4:gcc-12.1.0 library on Caviness:

(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
mpi4py      4.0.1

Additional packages that require mpi4py can now be installed into the environment.

The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present:

$ mkdir -p ${HOME}/.valet
$ echo ${HOME}/conda-envs/my-sci-app
/home/1001/conda-envs/my-sci-app

Take note of the path echoed, then create a new file named ${HOME}/.valet/my-sci-app.vpkg_yaml and add the following text to it:

my-sci-app:
    prefix: /home/1001/conda-envs/my-sci-app
    description: Some scientific app project in Python
    flags:
        - no-standard-paths
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          order: failure-first
          success: 0
    versions:
          "20201102":
              description: environment built Nov 2, 2020
              dependencies:
                  - openmpi/4.0.2
                  - intel-python/2020u2:python3

Using the Virtual Environment

The versions of the virtual environment declared in the VALET package are listed using the vpkg_versions command:

$ vpkg_versions my-sci-app
 
Available versions in package (* = default version):
 
[/home/1001/.valet/my-sci-app.vpkg_yaml]
my-sci-app  Some scientific app project in Python
* 20201102  environment built Nov 2, 2020

Activating the virtual environment is accomplished using the vpkg_require command (in your login shell or inside job scripts):

$ vpkg_require my-sci-app/20201102
Adding dependency `libfabric/1.9.0` to your environment
Adding dependency `openmpi/4.0.2` to your environment
Adding dependency `intel-python/2020u2:python3` to your environment
Adding package `my-sci-app/20201102` to your environment
(/home/1001/conda-envs/my-sci-app/20201102)$ which python3
~/conda-envs/my-sci-app/20201102/bin/python3
(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py
mpi4py      3.0.3
$ which mpirun
/opt/shared/openmpi/4.0.2/bin/mpirun

The steps for completing this work on DARWIN are similar to those presented for Caviness and of course following the first part to create a directory hierarchy. We will instead use the Intel oneAPI Python distribution:

$ vpkg_require openmpi/5.0.2:intel-oneapi-2024 intel-oneapi/2024
Adding dependency `gcc/12.2.0` to your environment
Adding dependency `intel-oneapi/2024.0.1.46` to your environment
Adding dependency `ucx/1.13.1` to your environment
Adding package `openmpi/5.0.2:intel-oneapi-2024` to your environment

The virtual environment is first populated with all packages that do not require mpi4py. Any packages requiring mpi4py must be installed after we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py.

$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20240307 --channel intel --override-channels python'=>3.9' numpy scipy
Collecting package metadata (current_repodata.json): done
Solving environment: done
    :
Proceed ([y]/n)? y
    :
#
# To activate this environment, use
#
#     $ conda activate /home/1006/conda-envs/my-sci-app/20240307
#
# To deactivate an active environment, use
#
#     $ conda deactivate

Before building and installing mpi4py the environment needs to be activated:

$ conda activate /home/1006/conda-envs/my-sci-app/20240307
(/home/1006/conda-envs/my-sci-app/20240307)$

With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.

(/home/1006/conda-envs/my-sci-app/20240307)$ pip install --no-binary :all: --compile mpi4py
Collecting mpi4py
  Downloading mpi4py-3.1.5.tar.gz (2.5 MB)
     ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 16.7 MB/s eta 0:00:00
  Installing build dependencies ... done
  Getting requirements to build wheel ... done
  Preparing metadata (pyproject.toml) ... done
Building wheels for collected packages: mpi4py
  Building wheel for mpi4py (pyproject.toml) ... done
  Created wheel for mpi4py: filename=mpi4py-3.1.5-cp310-cp310-linux_x86_64.whl size=634821 sha256=78a58c10acd22b3cf2ebf9e73b445d6775ac29f3f59c37e63bd16e27b7467ba2
  Stored in directory: /home/1006/.cache/pip/wheels/18/2b/7f/c852523089e9182b45fca50ff56f49a51eeb6284fd25a66713
Successfully built mpi4py
Installing collected packages: mpi4py
Successfully installed mpi4py-3.1.5

The –no-binary :all: flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The –compile flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the openmpi/5.0.2:intel-oneapi-2024 library on DARWIN:

(/home/1006/conda-envs/my-sci-app/20240307)$ pip list | grep mpi4py
mpi4py             3.1.5

Additional packages that require mpi4py can now be installed into the environment.

The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present:

$ mkdir -p ${HOME}/.valet
$ echo ${HOME}/conda-envs/my-sci-app
/home/1006/conda-envs/my-sci-app

Take note of the path echoed, then create a new file named ${HOME}/.valet/my-sci-app.vpkg_yaml and add the following text to it:

my-sci-app:
    prefix: /home/1006/conda-envs/my-sci-app
    description: Some scientific app project in Python
    flags:
        - no-standard-paths
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          order: failure-first
          success: 0
    versions:
          "20240307":
              description: environment built Mar 7, 2024
              dependencies:
                  - openmpi/5.0.2:intel-oneapi-2024
                  - intel-oneapi/2024

Using the Virtual Environment

The versions of the virtual environment declared in the VALET package are listed using the vpkg_versions command:

$ vpkg_versions my-sci-app
 
Available versions in package (* = default version):
 
[/home/1006/.valet/my-sci-app.vpkg_yaml]
my-sci-app  Some scientific app project in Python
* 20240307  environment built Mar 7, 2024

Activating the virtual environment is accomplished using the vpkg_require command (in your login shell or inside job scripts):

$ vpkg_require my-sci-app/20240307
Adding dependency `gcc/12.2.0` to your environment
Adding dependency `intel-oneapi/2024.0.1.46` to your environment
Adding dependency `ucx/1.13.1` to your environment
Adding dependency `openmpi/5.0.2:intel-oneapi-2024` to your environment
Adding package `my-sci-app/20240307` to your environment
(/home/1006/conda-envs/my-sci-app/20240307)$ which python3
~/conda-envs/my-sci-app/20240305/bin/python3
(/home/1006/conda-envs/my-sci-app/20240305)$ pip list | grep mpi4py
mpi4py             3.1.5
(/home/1006/conda-envs/my-sci-app/20240305)$ which mpirun
/opt/shared/openmpi/5.0.2-intel-oneapi-2024/bin/mpirun
  • technical/recipes/mpi4py-in-virtualenv.txt
  • Last modified: 2024-11-19 17:13
  • by anita