====== Python Virtual Environments with mpi4py ====== Most conda channels include copies of the mpi4py module to satisfy dependencies of MPI-parallelized packages. But the mpi4py Python code must be built on top of a native MPI library (like MPICH, Open MPI, Intel MPI). As a result, the conda packages always include a bundled binary MPI library that was built to generic specifications: often without support for Infiniband communications or Slurm/Grid Engine integration support. For proper functioning it's recommended that mpi4py always be built on top of one of the MPI libraries IT-RCI provides on a cluster. ===== MPI and Conda Variants ===== In this example we will build the virtual environment on Farber using the ''openmpi/4.0.5'' version of Open MPI and Anaconda for the virtual environment: $ vpkg_require openmpi/4.0.5 anaconda/5.2.0:python3 Adding dependency `ucx/1.9.0` to your environment Adding package `openmpi/4.0.5` to your environment Adding package `anaconda/5.2.0:python3` to your environment Due to recent announcements regarding Anaconda, and Intel dropping their distribution channel, any documentation referring to Intel's channel will need to be updated. Please use ''conda-forge'' channel for installations. ===== Create a Directory Hierarchy ===== We will be creating a Python virtual environment containing Numpy and Scipy libraries into which mpi4py will be added. In case we will need to create additional similar environments in the future, we will setup a directory hierarchy that allows multiple versions to coexist: $ mkdir -p ${HOME}/conda-envs/my-sci-app/20201102 Two things to note: * As written the directory hierarchy is created in the user's home directory; ''${HOME}'' could be replaced by ''${WORKDIR}/users/myname'', for example, to create it elsewhere. * The current date is used as a version identifier; using the format ''YYYYMMDD'' promotes simple sorting of the versions from oldest to newest. The directory structure will lend ''my-sci-app'' to straightforward management using VALET. ===== Farber ===== ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. The two channel options are present to ensure only the default Anaconda channels are consulted -- otherwise the command could still pick packages from the Intel channel, for example, which would still have the binary compatibility issues! $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Solving environment: done : Proceed ([y]/n)? y : Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use: # > source activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use: # > source deactivate # Before building and installing mpi4py the environment needs to be activated: $ source activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$ ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-3.0.3.tar.gz (1.4 MB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-3.0.3 The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''openmpi/4.0.5'' library on Farber: (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_json'' and add the following text to it: { "my-sci-app": { "prefix": "/home/1001/conda-envs/my-sci-app", "description": "Some scientific app project in Python", "standard-paths": false, "actions": [ { "action": "source", "order": "failure-first", "success": 0, "script": { "sh": "anaconda-activate.sh" } } ], "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] } } } } Please note: - The ''prefix'' path will be different for you - We do not need to tell VALET the full path to each version; the version identifier **is** the subdirectory or ''prefix'' containing that version - If you choose a different version of Open MPI or Anaconda, alter the ''dependencies'' list accordingly - New versions of this project are appended to the ''versions'' dictionary: "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] }, "20201114": { "description": "environment built Nov 14, 2020", "dependencies": [ "openmpi/3.1.6", "anaconda/5.2.0:python3" ] } } === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_json] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020 Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): $ vpkg_require my-sci-app/20201102 Adding dependency `ucx/1.9.0` to your environment Adding dependency `openmpi/4.0.5` to your environment Adding dependency `anaconda/5.2.0:python3` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 $ which mpirun /opt/shared/openmpi/4.0.5/bin/mpirun ===== Caviness ===== The steps for completing this work on Caviness are similar to those presented for Farber and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]]. We will instead use the Intel Python distribution: $ vpkg_require openmpi/4.1.4:gcc-12.1.0 anaconda/2024.02 Adding dependency `libfabric/1.13.2` to your environment Adding dependency `binutils/2.35` to your environment Adding dependency `gcc/12.1.0` to your environment Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment Adding package `anaconda/2024.02` to your environment ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use # # $ conda deactivate Before building and installing mpi4py the environment needs to be activated: $ conda activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$ ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. Due to Anaconda trying to use a version of ''ld'' as part of the virtual environment in lieu of the system ''ld'', you need to change the permissions to allow the compile to work properly. (/home/1001/conda-envs/my-sci-app/20201102)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-4.0.1.tar.gz (466 kB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-4.0.1 The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''openmpi/4.1.4:gcc-12.1.0'' library on Caviness: (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 4.0.1 Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it: my-sci-app: prefix: /home/1001/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20201102": description: environment built Nov 2, 2020 dependencies: - openmpi/4.1.4:gcc-12.1.0 - anaconda/2024.02 === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020 Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): $ vpkg_require my-sci-app/20201102 Adding dependency `libfabric/1.13.2` to your environment Adding dependency `binutils/2.35` to your environment Adding dependency `gcc/12.1.0` to your environment Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment Adding package `anaconda/2024.02` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 4.0.1 $ which mpirun /opt/shared/openmpi/4.1.4:gcc-12.1.0/bin/mpirun ===== DARWIN ===== The steps for completing this work on DARWIN are similar to those presented for Caviness and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]]. We will instead use the Intel oneAPI Python distribution: $ vpkg_require openmpi/4.1.5:gcc-12.2 anaconda/2024.02 Adding dependency `gcc/12.2.0` to your environment Adding dependency `ucx/1.13.1` to your environment Adding package `openmpi/4.1.5:gcc-12.2` to your environment Adding package `anaconda/2024.02:python3` to your environment ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20250121 --channel defaults --override-channels python'=>3.7' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1006/conda-envs/my-sci-app/20250121 # # To deactivate an active environment, use # # $ conda deactivate Before building and installing mpi4py the environment needs to be activated: $ conda activate /home/1006/conda-envs/my-sci-app/20250121 (/home/1006/conda-envs/my-sci-app/20250121)$ ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. (/home/1006/conda-envs/my-sci-app/20250121)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld (/home/1006/conda-envs/my-sci-app/20250121)$ pip install --no-binary :all: --compile mpi4py $ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Downloading mpi4py-4.0.1.tar.gz (466 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: mpi4py Building wheel for mpi4py (pyproject.toml) ... done Created wheel for mpi4py: filename=mpi4py-4.0.1-cp313-cp313-linux_x86_64.whl size=997834 sha256=b09b4fe26c8aa940bdcbdb512960fb73edb9ed9ed698b9455db3e1f3d5b078a5 Stored in directory: /home/1006/.cache/pip/wheels/27/79/62/f500b54e8b8ce5f5e54e7b84e8695938988ca274117d39983b Successfully built mpi4py Installing collected packages: mpi4py Successfully installed mpi4py-4.0.1 The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''''openmpi/4.1.4:gcc-12.2.0'''' library on DARWIN: (/home/1006/conda-envs/my-sci-app/20250121)$ pip list | grep mpi4py mpi4py 4.0.1 Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1006/conda-envs/my-sci-app Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it: my-sci-app: prefix: /home/1006/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20250121": description: environment built Jan 21, 2025 dependencies: - openmpi/4.1.5:gcc-12.2 - anaconda/2024.02 === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1006/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20250125 environment built Jan 21, 2025 Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): $ vpkg_require my-sci-app/20250121 Adding dependency `gcc/12.2.0` to your environment Adding dependency `ucx/1.13.1` to your environment Adding dependency `openmpi/4.1.5:gcc-12.2` to your environment Adding dependency `anaconda/2024.02:python3` to your environment Adding package `my-sci-app/20250121` to your environment (/home/1006/conda-envs/my-sci-app/20250121)$ which python3 ~/conda-envs/my-sci-app/20250121/bin/python3 (/home/1006/conda-envs/my-sci-app/20250121)$ which mpirun /opt/shared/openmpi/4.1.5-gcc-12.2/bin/mpirun