Python Virtual Environments with mpi4py
Most conda channels include copies of the mpi4py module to satisfy dependencies of MPI-parallelized packages. But the mpi4py Python code must be built on top of a native MPI library (like MPICH, Open MPI, Intel MPI). As a result, the conda packages always include a bundled binary MPI library that was built to generic specifications: often without support for Infiniband communications or Slurm/Grid Engine integration support. For proper functioning it's recommended that mpi4py always be built on top of one of the MPI libraries IT-RCI provides on a cluster.
MPI and Conda Variants
In this example we will build the virtual environment on Farber using the openmpi/4.0.5
version of Open MPI and Anaconda for the virtual environment:
$ vpkg_require openmpi/4.0.5 anaconda/5.2.0:python3 Adding dependency `ucx/1.9.0` to your environment Adding package `openmpi/4.0.5` to your environment Adding package `anaconda/5.2.0:python3` to your environment
On Caviness and DARWIN we would likely choose the Intel distribution selecting a Python 3 version over Anaconda (e.g. vpkg_versions intel-oneapi
or intel-python
) since it automatically enables Intel's distribution channel. That channel includes Numpy built against the Intel MKL library, for example, and other highly-optimized variants of computationally-intensive Python components.
As of November 2020, the majority of packages populating the Intel channel require baseline operating system libraries (like glibc
) newer than what Farber provides: a clear example of the binary compatibility issues that are present in conda software distribution.
Create a Directory Hierarchy
We will be creating a Python virtual environment containing Numpy and Scipy libraries into which mpi4py will be added. In case we will need to create additional similar environments in the future, we will setup a directory hierarchy that allows multiple versions to coexist:
$ mkdir -p ${HOME}/conda-envs/my-sci-app/20201102
Two things to note:
- As written the directory hierarchy is created in the user's home directory;
${HOME}
could be replaced by${WORKDIR}/users/myname
, for example, to create it elsewhere. - The current date is used as a version identifier; using the format
YYYYMMDD
promotes simple sorting of the versions from oldest to newest.
The directory structure will lend my-sci-app
to straightforward management using VALET.
Farber
Create the Virtual Environment
The virtual environment is first populated with all packages that do not require mpi4py. Any packages requiring mpi4py must be installed after we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py.
The two channel options are present to ensure only the default Anaconda channels are consulted – otherwise the command could still pick packages from the Intel channel, for example, which would still have the binary compatibility issues!
$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Solving environment: done : Proceed ([y]/n)? y : Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use: # > source activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use: # > source deactivate #
Before building and installing mpi4py the environment needs to be activated:
$ source activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$
Building mpi4py
With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.
(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-3.0.3.tar.gz (1.4 MB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-3.0.3
The –no-binary :all:
flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The –compile
flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the openmpi/4.0.5
library on Farber:
(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3
Additional packages that require mpi4py can now be installed into the environment.
VALET Package Definition
The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present:
$ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app
Take note of the path echoed, then create a new file named ${HOME}/.valet/my-sci-app.vpkg_json
and add the following text to it:
{ "my-sci-app": { "prefix": "/home/1001/conda-envs/my-sci-app", "description": "Some scientific app project in Python", "standard-paths": false, "actions": [ { "action": "source", "order": "failure-first", "success": 0, "script": { "sh": "anaconda-activate.sh" } } ], "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] } } } }
Please note:
- The
prefix
path will be different for you - We do not need to tell VALET the full path to each version; the version identifier is the subdirectory or
prefix
containing that version - If you choose a different version of Open MPI or Anaconda, alter the
dependencies
list accordingly - New versions of this project are appended to the
versions
dictionary:"versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] }, "20201114": { "description": "environment built Nov 14, 2020", "dependencies": [ "openmpi/3.1.6", "anaconda/5.2.0:python3" ] } }
Using the Virtual Environment
The versions of the virtual environment declared in the VALET package are listed using the vpkg_versions
command:
$ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_json] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020
Activating the virtual environment is accomplished using the vpkg_require
command (in your login shell or inside job scripts):
$ vpkg_require my-sci-app/20201102 Adding dependency `ucx/1.9.0` to your environment Adding dependency `openmpi/4.0.5` to your environment Adding dependency `anaconda/5.2.0:python3` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 $ which mpirun /opt/shared/openmpi/4.0.5/bin/mpirun
Caviness
The steps for completing this work on Caviness are similar to those presented for Farber and of course following the first part to create a directory hierarchy. We will instead use the Intel Python distribution:
$ vpkg_require openmpi/4.1.4:gcc-12.1.0 anaconda/2024.02 Adding dependency `libfabric/1.13.2` to your environment Adding dependency `binutils/2.35` to your environment Adding dependency `gcc/12.1.0` to your environment Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment Adding package `anaconda/2024.02` to your environment
Create the Virtual Environment
The virtual environment is first populated with all packages that do not require mpi4py. Any packages requiring mpi4py must be installed after we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py.
$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 python'=>3.7' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use # # $ conda deactivate
Before building and installing mpi4py the environment needs to be activated:
$ conda activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$
Building mpi4py
With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.
(/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-4.0.1.tar.gz (466 kB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-4.0.1
However, you may experience an error during the compile and will indicate it failed showing the following
/opt/shared/openmpi/4.1.4-gcc-12.1.0/bin/mpicc -pthread -B /home/1001/conda-envs/my-sci-apps/20201102/compiler_compat _configtest.o -o _configtest
because the specification of -B
is looking for a version of ld
as part of the environment in lieu of the system ld
. If this should happen, then you will need to change the permissions to allow it to work and retry the compile again.
(/home/1001/conda-envs/my-sci-app/20201102)$ chmod 000 /home/1001/conda-envs/my-sci-app/202011029/compiler_compat/ld (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py
The –no-binary :all:
flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The –compile
flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the openmpi/4.1.4:gcc-12.1.0
library on Caviness:
(/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 4.0.1
Additional packages that require mpi4py can now be installed into the environment.
VALET Package Definition
The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present:
$ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app
Take note of the path echoed, then create a new file named ${HOME}/.valet/my-sci-app.vpkg_yaml
and add the following text to it:
my-sci-app: prefix: /home/1001/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20201102": description: environment built Nov 2, 2020 dependencies: - openmpi/4.0.2 - intel-python/2020u2:python3
Using the Virtual Environment
The versions of the virtual environment declared in the VALET package are listed using the vpkg_versions
command:
$ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020
Activating the virtual environment is accomplished using the vpkg_require
command (in your login shell or inside job scripts):
$ vpkg_require my-sci-app/20201102 Adding dependency `libfabric/1.9.0` to your environment Adding dependency `openmpi/4.0.2` to your environment Adding dependency `intel-python/2020u2:python3` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 $ which mpirun /opt/shared/openmpi/4.0.2/bin/mpirun
DARWIN
The steps for completing this work on DARWIN are similar to those presented for Caviness and of course following the first part to create a directory hierarchy. We will instead use the Intel oneAPI Python distribution:
$ vpkg_require openmpi/5.0.2:intel-oneapi-2024 intel-oneapi/2024 Adding dependency `gcc/12.2.0` to your environment Adding dependency `intel-oneapi/2024.0.1.46` to your environment Adding dependency `ucx/1.13.1` to your environment Adding package `openmpi/5.0.2:intel-oneapi-2024` to your environment
Create the Virtual Environment
The virtual environment is first populated with all packages that do not require mpi4py. Any packages requiring mpi4py must be installed after we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py.
$ conda create --prefix ${HOME}/conda-envs/my-sci-app/20240307 --channel intel --override-channels python'=>3.9' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1006/conda-envs/my-sci-app/20240307 # # To deactivate an active environment, use # # $ conda deactivate
Before building and installing mpi4py the environment needs to be activated:
$ conda activate /home/1006/conda-envs/my-sci-app/20240307 (/home/1006/conda-envs/my-sci-app/20240307)$
Building mpi4py
With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment.
(/home/1006/conda-envs/my-sci-app/20240307)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Downloading mpi4py-3.1.5.tar.gz (2.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 16.7 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: mpi4py Building wheel for mpi4py (pyproject.toml) ... done Created wheel for mpi4py: filename=mpi4py-3.1.5-cp310-cp310-linux_x86_64.whl size=634821 sha256=78a58c10acd22b3cf2ebf9e73b445d6775ac29f3f59c37e63bd16e27b7467ba2 Stored in directory: /home/1006/.cache/pip/wheels/18/2b/7f/c852523089e9182b45fca50ff56f49a51eeb6284fd25a66713 Successfully built mpi4py Installing collected packages: mpi4py Successfully installed mpi4py-3.1.5
The –no-binary :all:
flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The –compile
flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the openmpi/5.0.2:intel-oneapi-2024
library on DARWIN:
(/home/1006/conda-envs/my-sci-app/20240307)$ pip list | grep mpi4py mpi4py 3.1.5
Additional packages that require mpi4py can now be installed into the environment.
VALET Package Definition
The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present:
$ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1006/conda-envs/my-sci-app
Take note of the path echoed, then create a new file named ${HOME}/.valet/my-sci-app.vpkg_yaml
and add the following text to it:
my-sci-app: prefix: /home/1006/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20240307": description: environment built Mar 7, 2024 dependencies: - openmpi/5.0.2:intel-oneapi-2024 - intel-oneapi/2024
Using the Virtual Environment
The versions of the virtual environment declared in the VALET package are listed using the vpkg_versions
command:
$ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1006/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20240307 environment built Mar 7, 2024
Activating the virtual environment is accomplished using the vpkg_require
command (in your login shell or inside job scripts):
$ vpkg_require my-sci-app/20240307 Adding dependency `gcc/12.2.0` to your environment Adding dependency `intel-oneapi/2024.0.1.46` to your environment Adding dependency `ucx/1.13.1` to your environment Adding dependency `openmpi/5.0.2:intel-oneapi-2024` to your environment Adding package `my-sci-app/20240307` to your environment (/home/1006/conda-envs/my-sci-app/20240307)$ which python3 ~/conda-envs/my-sci-app/20240305/bin/python3 (/home/1006/conda-envs/my-sci-app/20240305)$ pip list | grep mpi4py mpi4py 3.1.5 (/home/1006/conda-envs/my-sci-app/20240305)$ which mpirun /opt/shared/openmpi/5.0.2-intel-oneapi-2024/bin/mpirun