Show pageOld revisionsBack to top This page is read only. You can view the source, but not change it. Ask your administrator if you think this is wrong. ====== Python Virtual Environments with mpi4py ====== Most conda channels include copies of the mpi4py module to satisfy dependencies of MPI-parallelized packages. But the mpi4py Python code must be built on top of a native MPI library (like MPICH, Open MPI, Intel MPI). As a result, the conda packages always include a bundled binary MPI library that was built to generic specifications: often without support for Infiniband communications or Slurm/Grid Engine integration support. For proper functioning it's recommended that mpi4py always be built on top of one of the MPI libraries IT-RCI provides on a cluster. ===== MPI and Conda Variants ===== In this example we will build the virtual environment on Farber using the ''openmpi/4.0.5'' version of Open MPI and Anaconda for the virtual environment: <code bash> $ vpkg_require openmpi/4.0.5 anaconda/5.2.0:python3 Adding dependency `ucx/1.9.0` to your environment Adding package `openmpi/4.0.5` to your environment Adding package `anaconda/5.2.0:python3` to your environment </code> <WRAP center round info 60%> Due to recent announcements regarding Anaconda, and Intel dropping their distribution channel, any documentation referring to Intel's channel will need to be updated. Please use ''conda-forge'' channel for installations. </WRAP> ===== Create a Directory Hierarchy ===== We will be creating a Python virtual environment containing Numpy and Scipy libraries into which mpi4py will be added. In case we will need to create additional similar environments in the future, we will setup a directory hierarchy that allows multiple versions to coexist: <code base> $ mkdir -p ${HOME}/conda-envs/my-sci-app/20201102 </code> Two things to note: * As written the directory hierarchy is created in the user's home directory; ''${HOME}'' could be replaced by ''${WORKDIR}/users/myname'', for example, to create it elsewhere. * The current date is used as a version identifier; using the format ''YYYYMMDD'' promotes simple sorting of the versions from oldest to newest. The directory structure will lend ''my-sci-app'' to straightforward management using VALET. ===== Farber ===== ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. <WRAP center round important 60%> The two channel options are present to ensure only the default Anaconda channels are consulted -- otherwise the command could still pick packages from the Intel channel, for example, which would still have the binary compatibility issues! </WRAP> <code bash> $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Solving environment: done : Proceed ([y]/n)? y : Preparing transaction: done Verifying transaction: done Executing transaction: done # # To activate this environment, use: # > source activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use: # > source deactivate # </code> Before building and installing mpi4py the environment needs to be activated: <code bash> $ source activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$ </code> ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. <code base> (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-3.0.3.tar.gz (1.4 MB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-3.0.3 </code> The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''openmpi/4.0.5'' library on Farber: <code bash> (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 </code> Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: <code bash> $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app </code> Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_json'' and add the following text to it: <code json> { "my-sci-app": { "prefix": "/home/1001/conda-envs/my-sci-app", "description": "Some scientific app project in Python", "standard-paths": false, "actions": [ { "action": "source", "order": "failure-first", "success": 0, "script": { "sh": "anaconda-activate.sh" } } ], "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] } } } } </code> Please note: - The ''prefix'' path will be different for you - We do not need to tell VALET the full path to each version; the version identifier **is** the subdirectory or ''prefix'' containing that version - If you choose a different version of Open MPI or Anaconda, alter the ''dependencies'' list accordingly - New versions of this project are appended to the ''versions'' dictionary:<code bash> "versions": { "20201102": { "description": "environment built Nov 2, 2020", "dependencies": [ "openmpi/4.0.5", "anaconda/5.2.0:python3" ] }, "20201114": { "description": "environment built Nov 14, 2020", "dependencies": [ "openmpi/3.1.6", "anaconda/5.2.0:python3" ] } } </code> === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: <code bash> $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_json] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020 </code> Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): <code bash> $ vpkg_require my-sci-app/20201102 Adding dependency `ucx/1.9.0` to your environment Adding dependency `openmpi/4.0.5` to your environment Adding dependency `anaconda/5.2.0:python3` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 3.0.3 $ which mpirun /opt/shared/openmpi/4.0.5/bin/mpirun </code> ===== Caviness ===== The steps for completing this work on Caviness are similar to those presented for Farber and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]]. We will instead use the Intel Python distribution: <code bash> $ vpkg_require openmpi/4.1.4:gcc-12.1.0 anaconda/2024.02 Adding dependency `libfabric/1.13.2` to your environment Adding dependency `binutils/2.35` to your environment Adding dependency `gcc/12.1.0` to your environment Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment Adding package `anaconda/2024.02` to your environment </code> ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. <code bash> $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20201102 --channel defaults --override-channels python'=>3.7' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1001/conda-envs/my-sci-app/20201102 # # To deactivate an active environment, use # # $ conda deactivate </code> Before building and installing mpi4py the environment needs to be activated: <code bash> $ conda activate /home/1001/conda-envs/my-sci-app/20201102 (/home/1001/conda-envs/my-sci-app/20201102)$ </code> ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. Due to Anaconda trying to use a version of ''ld'' as part of the virtual environment in lieu of the system ''ld'', you need to change the permissions to allow the compile to work properly. <code base> (/home/1001/conda-envs/my-sci-app/20201102)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld (/home/1001/conda-envs/my-sci-app/20201102)$ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Using cached mpi4py-4.0.1.tar.gz (466 kB) Skipping wheel build for mpi4py, due to binaries being disabled for it. Installing collected packages: mpi4py Running setup.py install for mpi4py ... done Successfully installed mpi4py-4.0.1 </code> The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''openmpi/4.1.4:gcc-12.1.0'' library on Caviness: <code bash> (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 4.0.1 </code> Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: <code bash> $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1001/conda-envs/my-sci-app </code> Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it: <code yaml> my-sci-app: prefix: /home/1001/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20201102": description: environment built Nov 2, 2020 dependencies: - openmpi/4.1.4:gcc-12.1.0 - anaconda/2024.02 </code> === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: <code bash> $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1001/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20201102 environment built Nov 2, 2020 </code> Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): <code bash> $ vpkg_require my-sci-app/20201102 Adding dependency `libfabric/1.13.2` to your environment Adding dependency `binutils/2.35` to your environment Adding dependency `gcc/12.1.0` to your environment Adding package `openmpi/4.1.4:gcc-12.1.0` to your environment Adding package `anaconda/2024.02` to your environment Adding package `my-sci-app/20201102` to your environment (/home/1001/conda-envs/my-sci-app/20201102)$ which python3 ~/conda-envs/my-sci-app/20201102/bin/python3 (/home/1001/conda-envs/my-sci-app/20201102)$ pip list | grep mpi4py mpi4py 4.0.1 $ which mpirun /opt/shared/openmpi/4.1.4:gcc-12.1.0/bin/mpirun </code> ===== DARWIN ===== The steps for completing this work on DARWIN are similar to those presented for Caviness and of course following the first part to [[technical:recipes:mpi4py-in-virtualenv#create-a-directory-hierarchy|create a directory hierarchy]]. We will instead use the Intel oneAPI Python distribution: <code bash> $ vpkg_require openmpi/4.1.5:gcc-12.2 anaconda/2024.02 Adding dependency `gcc/12.2.0` to your environment Adding dependency `ucx/1.13.1` to your environment Adding package `openmpi/4.1.5:gcc-12.2` to your environment Adding package `anaconda/2024.02:python3` to your environment </code> ==== Create the Virtual Environment ==== The virtual environment is first populated with all packages that **do not** require mpi4py. Any packages requiring mpi4py must be installed //after// we build and install our local copy of mpi4py in the virtual environment. In this example, neither Numpy nor Scipy require mpi4py. <code bash> $ conda create --prefix ${HOME}/conda-envs/my-sci-app/20250121 --channel defaults --override-channels python'=>3.7' numpy scipy Collecting package metadata (current_repodata.json): done Solving environment: done : Proceed ([y]/n)? y : # # To activate this environment, use # # $ conda activate /home/1006/conda-envs/my-sci-app/20250121 # # To deactivate an active environment, use # # $ conda deactivate </code> Before building and installing mpi4py the environment needs to be activated: <code bash> $ conda activate /home/1006/conda-envs/my-sci-app/20250121 (/home/1006/conda-envs/my-sci-app/20250121)$ </code> ==== Building mpi4py ==== With the new virtual environment activated, we can now build mpi4py against the local Open MPI library we added to the shell environment. <code base> (/home/1006/conda-envs/my-sci-app/20250121)$ chmod 000 /home/1001/conda-envs/my-sci-app/20201102/compiler_compat/ld (/home/1006/conda-envs/my-sci-app/20250121)$ pip install --no-binary :all: --compile mpi4py $ pip install --no-binary :all: --compile mpi4py Collecting mpi4py Downloading mpi4py-4.0.1.tar.gz (466 kB) Installing build dependencies ... done Getting requirements to build wheel ... done Installing backend dependencies ... done Preparing metadata (pyproject.toml) ... done Building wheels for collected packages: mpi4py Building wheel for mpi4py (pyproject.toml) ... done Created wheel for mpi4py: filename=mpi4py-4.0.1-cp313-cp313-linux_x86_64.whl size=997834 sha256=b09b4fe26c8aa940bdcbdb512960fb73edb9ed9ed698b9455db3e1f3d5b078a5 Stored in directory: /home/1006/.cache/pip/wheels/27/79/62/f500b54e8b8ce5f5e54e7b84e8695938988ca274117d39983b Successfully built mpi4py Installing collected packages: mpi4py Successfully installed mpi4py-4.0.1 </code> The ''--no-binary :all:'' flag prohibits the installation of any packages that include binary components, effectively forcing a rebuild of mpi4py from source. The ''--compile'' flag pre-processes all Python scripts in the mpi4py package (versus allowing them to be processed and cached later). The environment now includes support for mpi4py linked against the ''''openmpi/4.1.4:gcc-12.2.0'''' library on DARWIN: <code bash> (/home/1006/conda-envs/my-sci-app/20250121)$ pip list | grep mpi4py mpi4py 4.0.1 </code> Additional packages that require mpi4py can now be installed into the environment. ==== VALET Package Definition ==== The new virtual environment can easily be added to your login shell and job runtime environments using VALET. First, ensure you have your personal VALET package definition directory present: <code bash> $ mkdir -p ${HOME}/.valet $ echo ${HOME}/conda-envs/my-sci-app /home/1006/conda-envs/my-sci-app </code> Take note of the path echoed, then create a new file named ''${HOME}/.valet/my-sci-app.vpkg_yaml'' and add the following text to it: <code yaml> my-sci-app: prefix: /home/1006/conda-envs/my-sci-app description: Some scientific app project in Python flags: - no-standard-paths actions: - action: source script: sh: anaconda-activate.sh order: failure-first success: 0 versions: "20250121": description: environment built Jan 21, 2025 dependencies: - openmpi/4.1.5:gcc-12.2 - anaconda/2024.02 </code> === Using the Virtual Environment === The versions of the virtual environment declared in the VALET package are listed using the ''vpkg_versions'' command: <code bash> $ vpkg_versions my-sci-app Available versions in package (* = default version): [/home/1006/.valet/my-sci-app.vpkg_yaml] my-sci-app Some scientific app project in Python * 20250125 environment built Jan 21, 2025 </code> Activating the virtual environment is accomplished using the ''vpkg_require'' command (in your login shell or inside job scripts): <code bash> $ vpkg_require my-sci-app/20250121 Adding dependency `gcc/12.2.0` to your environment Adding dependency `ucx/1.13.1` to your environment Adding dependency `openmpi/4.1.5:gcc-12.2` to your environment Adding dependency `anaconda/2024.02:python3` to your environment Adding package `my-sci-app/20250121` to your environment (/home/1006/conda-envs/my-sci-app/20250121)$ which python3 ~/conda-envs/my-sci-app/20250121/bin/python3 (/home/1006/conda-envs/my-sci-app/20250121)$ which mpirun /opt/shared/openmpi/4.1.5-gcc-12.2/bin/mpirun </code> technical/recipes/mpi4py-in-virtualenv.txt Last modified: 2025-01-31 15:29by anita