====== Python Virtual Environments with mpi4py ====== Most conda channels include copies of the mpi4py module to satisfy dependencies of MPI-parallelized packages. But the mpi4py Python code must be built on top of a native MPI library (like MPICH, Open MPI, Intel MPI). As a result, the conda packages always include a bundled binary MPI library that was built to generic specifications: often without support for Infiniband communications or Slurm/Grid Engine integration support. For proper functioning it's recommended that mpi4py always be built on top of one of the MPI libraries IT-RCI provides on a cluster. ===== MPI and Conda Variants ===== In this example we will build the virtual environment on Farber using the ''openmpi/4.0.5'' version of Open MPI and Anaconda for the virtual environment: $ vpkg_require openmpi/4.0.5 anaconda/5.2.0:python3 Adding dependency `ucx/1.9.0` to your environment Adding package `openmpi/4.0.5` to your environment Adding package `anaconda/5.2.0:python3` to your environment On Caviness and DARWIN we would likely choose the Intel distribution selecting a Python 3 version over Anaconda (e.g. ''vpkg_versions intel-oneapi'' or ''intel-python'') since it automatically enables Intel's distribution channel. That channel includes Numpy built against the Intel MKL library, for example, and other highly-optimized variants of computationally-intensive Python components. As of November 2020, the majority of packages populating the Intel channel require baseline operating system libraries (like ''glibc'') newer than what Farber provides: a clear example of the binary compatibility issues that are present in conda software distribution. ===== Create a Directory Hierarchy ===== We will be creating a Python virtual environment containing Numpy and Scipy libraries into which mpi4py will be added. In case we will need to create additional similar environments in the future, we will setup a directory hierarchy that allows multiple versions to coexist: $ mkdir -p ${HOME}/conda-envs/my-sci-app/20201102 Two things to note: * As written the directory hierarchy is created in the user's home directory; ''${HOME}'' could be replaced by ''${WORKDIR}/users/myname'', for example, to create it elsewhere. * The current date is used as a version identifier; using the format ''YYYYMMDD'' promotes simple sorting of the versions from oldest to newest. The directory structure will lend ''my-sci-app'' to straightforward management using VALET. ===== Farber ===== ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. The two channel options are present to ensure only the default Anaconda channels are consulted -- otherwise the command could still pick packages from the Intel channel, for example, which would still have the binary compatibility issues! $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Solving environment: done : Proceed ([y]/n)? y : Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use: # > source activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use: # > source deactivate # Before building and installing mpi4py the environment needs to be activated: $ source activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$ ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-3.0.3.tar.gz (1.4 MB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-3.0.3 The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''openmpi/4.0.5'' library on Farber: (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_json'' and add the following text to it: { "my-sci-app": { "prefix": "/home/1001/conda-envs/my-sci-app", "description": "Some scientific app project in Python", "standard-paths": false, "actions": [ { "action": "source", "order": "failure-first", "success": 0, "script": { "sh": "anaconda-activate.sh" } } ], "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] } } } } Please note: - The ''prefix'' path will be different for you - We do not need to tell VALET the full path to each version; the version identifier **is** the subdirectory or ''prefix'' containing that version - If you choose a different version of Open MPI or Anaconda, alter the ''dependencies'' list accordingly - New versions of this project are appended to the ''versions'' dictionary: "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] }, "20201114": { "description": "environment built Nov 14, 2020", "dependencies": [ "openmpi/3.1.6", "anaconda/5.2.0:python3" ] } } === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_json] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020 Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): $ vpkg_require my-sci-app/20201102 Adding dependency `ucx/1.9.0` to your environment Adding dependency `openmpi/4.0.5` to your environment Adding dependency `anaconda/5.2.0:python3` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 $ which mpirun /opt/shared/openmpi/4.0.5/bin/mpirun ===== Caviness ===== The steps for completing this work on Caviness are similar to those presented for Farber and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]]. We will instead use the Intel Python distribution: $ vpkg_require openmpi/4.1.4:gcc-12.1.0 anaconda/2024.02 Adding dependency `libfabric/1.13.2` to your environment Adding dependency `binutils/2.35` to your environment Adding dependency `gcc/12.1.0` to your environment Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment Adding package `anaconda/2024.02` to your environment ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use # # $ conda deactivate Before building and installing mpi4py the environment needs to be activated: $ conda activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$ ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. Due to Anaconda trying to use a version of ''ld'' as part of the virtual environment in lieu of the system ''ld'', you need to change the permissions to allow the compile to work properly. (/home/1001/conda-envs/my-sci-app/20201102)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-4.0.1.tar.gz (466 kB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-4.0.1 The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''openmpi/4.1.4:gcc-12.1.0'' library on Caviness: (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 4.0.1 Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it: my-sci-app: prefix: /home/1001/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20201102": description: environment built Nov 2, 2020 dependencies: - openmpi/4.0.2 - intel-python/2020u2:python3 === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020 Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): $ vpkg_require my-sci-app/20201102 Adding dependency `libfabric/1.9.0` to your environment Adding dependency `openmpi/4.0.2` to your environment Adding dependency `intel-python/2020u2:python3` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 $ which mpirun /opt/shared/openmpi/4.0.2/bin/mpirun ===== DARWIN ===== The steps for completing this work on DARWIN are similar to those presented for Caviness and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]]. We will instead use the Intel oneAPI Python distribution: $ vpkg_require openmpi/5.0.2:intel-oneapi-2024 intel-oneapi/2024 Adding dependency `gcc/12.2.0` to your environment Adding dependency `intel-oneapi/2024.0.1.46` to your environment Adding dependency `ucx/1.13.1` to your environment Adding package `openmpi/5.0.2:intel-oneapi-2024` to your environment ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20240307 --channel intel --override-channels python'=>3.9' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1006/conda-envs/my-sci-app/20240307 # # To deactivate an active environment, use # # $ conda deactivate Before building and installing mpi4py the environment needs to be activated: $ conda activate /home/1006/conda-envs/my-sci-app/20240307 (/home/1006/conda-envs/my-sci-app/20240307)$ ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. (/home/1006/conda-envs/my-sci-app/20240307)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Downloading mpi4py-3.1.5.tar.gz (2.5 MB) ━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 2.5/2.5 MB 16.7 MB/s eta 0:00:00 Installing build dependencies ... done Getting requirements to build wheel ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: mpi4py Building wheel for mpi4py (pyproject.toml) ... done Created wheel for mpi4py: filename=mpi4py-3.1.5-cp310-cp310-linux_x86_64.whl size=634821 sha256=78a58c10acd22b3cf2ebf9e73b445d6775ac29f3f59c37e63bd16e27b7467ba2 Stored in directory: /home/1006/.cache/pip/wheels/18/2b/7f/c852523089e9182b45fca50ff56f49a51eeb6284fd25a66713 Successfully built mpi4py Installing collected packages: mpi4py Successfully installed mpi4py-3.1.5 The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''''openmpi/5.0.2:intel-oneapi-2024'''' library on DARWIN: (/home/1006/conda-envs/my-sci-app/20240307)$ pip list | grep mpi4py mpi4py 3.1.5 Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1006/conda-envs/my-sci-app Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it: my-sci-app: prefix: /home/1006/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20240307": description: environment built Mar 7, 2024 dependencies: - openmpi/5.0.2:intel-oneapi-2024 - intel-oneapi/2024 === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1006/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20240307 environment built Mar 7, 2024 Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): $ vpkg_require my-sci-app/20240307 Adding dependency `gcc/12.2.0` to your environment Adding dependency `intel-oneapi/2024.0.1.46` to your environment Adding dependency `ucx/1.13.1` to your environment Adding dependency `openmpi/5.0.2:intel-oneapi-2024` to your environment Adding package `my-sci-app/20240307` to your environment (/home/1006/conda-envs/my-sci-app/20240307)$ which python3 ~/conda-envs/my-sci-app/20240305/bin/python3 (/home/1006/conda-envs/my-sci-app/20240305)$ pip list | grep mpi4py mpi4py 3.1.5 (/home/1006/conda-envs/my-sci-app/20240305)$ which mpirun /opt/shared/openmpi/5.0.2-intel-oneapi-2024/bin/mpirun