technical:recipes:emcee-in-virtualenv

Python Virtualenv: emcee and pyKLIP

Some portions of this recipe are specific to the IT-RCI HPC clusters (VALET-based environment setup) but otherwise should work on any system.

Choose a directory in which to install the one (or more) virtual environments (virtualenvs). The string «env-name» will denote the name chosen for the virtualenvs – this recipe will use mcmc-env. Do not locate this directory under the /lustre/scratch file system; typically a directory under the workgroup's storage is appropriate:

  • If setting-up this virtualenv for multiple users in the workgroup, choose ${WORKDIR}/sw/«env-name» as the base directory.
  • If this virtualenv is solely for personal use, choose ${WORKDIR}/users/<username>/sw/«env-name», for example.

Note that these examples assume a standard workgroup storage layout with group-writable sw and users directories at the top level. Create the directory:

$ mkdir "${WORKDIR}/sw/mcmc-env"

The virtualenv will be created using Intel's distribution for Python, but Anaconda would work just as well. One of the cluster's MPI libraries also needs to be loaded into the environment:

$ vpkg_require intel-python/2019u5:python3 openmpi/3.1.3

Make note of these two package identifiers: they will later be dependencies cited in the VALET package definition for this virtualenv.

In order to maintain one or more distinct instances of the mcmc-env virtualenv, each must be given a distinct version. While this could be any unique sequence of characters, it's usually best to adhere to a semantic versioning scheme. For Python virtualenvs the current date with an extra numerical component (in case multiple instances are created on the same day) is useful: YYYYMMDD.##, where the four-digit year and zero-padded month and day number are used. The trailing ## is also zero-padded. For example, 20200716.01 since today is July 16, 2020, and it's the first virtualenv created today.

Use the conda command to create the virtualenv and populate it with the most basic dependencies:

$ conda create --prefix="${WORKDIR}/sw/mcmc-env/20200716.01" --copy pip python=3 numpy scipy matplotlib astropy

The basic dependencies do not require an extensive list of other packages, most notably the mpi4py that must be built against the cluster's MPI library. If the virtualenv build succeeds (and why shouldn't it!) the new virtualenv can be activated and the location of the pip command inside it can be verified:

$ source activate "${WORKDIR}/sw/mcmc-env/20200716.01"
$ which pip
/work/workgroup/sw/mcmc-env/20200716.01/bin/pip

For the sake of the emcee and pyKLIP packages it is recommended that a few components be updated or installed first:

$ pip install -U setuptools setuptools_scm pep517 pytest

The pytest package is a soft requirement of the debrisdiskfm package. Next, install from source the mpi4py package as well as a few others:

$ pip install --no-binary :all: mpi4py corner image_registration git+https://github.com/seawander/DebrisDiskFM.git#egg=Package schwimmbad

The image_registration package is another soft requirement of the debrisdiskfm package. This may take some time since binary parts of the packages must be compiled and linked (e.g. the shared library containing hooks into the cluster MPI library for mpi4py) before being copied into the virtualenv.

The pyKLIP package must be installed from a cloned github source tree. The emcee package is a requirement of pyKLIP and will be installed automatically via pip1).

Rather than using the head of the git source tree – corresponding to ongoing revisions/bug-fixes and not a specific release of the package – the most recent release tag will be used:

$ git clone https://bitbucket.org/pyKLIP/pyklip.git pyklip-src
$ cd pyklip-src
$ git checkout v2.1

Installation into the virtualenv proceeds simply:

$ python setup.py install

Followed by removal of the cloned source:

$ cd ..
$ rm -rf pyklip-src

At this point the packages present in the completed virtualenv can be listed:

$ pip list
Package            Version
------------------ -------------------
astropy            4.0.1.post1
certifi            2020.6.20
corner             2.1.0
cycler             0.10.0
debrisdiskfm       0.1.0
emcee              3.0.2
FITS-tools         0.2
gqueue             3.0.3
image-registration 0.2.4
kiwisolver         1.2.0
matplotlib         3.2.2
mkl-fft            1.1.0
mkl-random         1.1.1
mkl-service        2.3.0
mpi4py             3.0.3
numpy              1.18.5
pep517             0.8.2
pip                20.1.1
pyklip             2.1
pyparsing          2.4.7
python-dateutil    2.8.1
schwimmbad         0.3.1
scipy              1.5.0
setuptools         49.2.0.post20200714
setuptools-scm     4.1.2
sip                4.19.13
six                1.15.0
toml               0.10.1
tornado            6.0.4
wheel              0.34.2

Version numbers may differ as newer versions of the component packages are released, but at the very least the

  • emcee
  • debrisdiskfm
  • schwimmbad
  • astropy
  • matplotlib
  • numpy
  • pyklip

packages should all be present in the list.

Return the shell environment to its original state before proceeding:

$ vpkg_rollback all

To simplify the addition of this virtualenv to the runtime environment, a VALET package definition file should be created. If the virtualenv was installed for use by one or more members of the workgroup, the file should be created as ${WORKDIR}/sw/valet/«env-name».vpkg_yaml. If the virtualenv is for personal use, create the file as ~/.valet/«env-name».vpkg_yaml. Naturally, you must ensure that the directory (${WORKDIR}/sw/valet or ~/.valet, respectively) exists prior to creating the package definition file.

The package definition file uses the YAML format. Information to know before proceeding:

ItemDescriptionValue in this example
«env-name»The name of the virtualenv createdmcmc-env
«prefix»The directory in which the «env-name» directory to hold the virtualenv was created/work/workgroup/sw
«version»The version identifier chosen for this instance of the virtualenv20200716.01
«vpkg-file»The path to the virtualenv package definition file/work/workgroup/sw/valet/mcmc-env.vpkg_yaml

The package definition file should follow this format:

«env-name»:
    prefix: «prefix»/«env-name»
    description: emcee and pyKLIP python environments
    flags:
        - no-standard-paths
    default-version: "«version»"
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          success: 0
    versions:
        "«version»":
            description: July 16, 2020, build 01
            dependencies:
                - openmpi/3.1.3
                - intel-python/2019u5:python3

For the virtualenv created in the course of this recipe, the resulting file «vpkg-file» would look like:

mcmc-env:
    prefix: /work/workgroup/sw/mcmc-env
    description: emcee and pyKLIP python environments
    flags:
        - no-standard-paths
    default-version: "20200716.01"
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          success: 0
    versions:
        "20200716.01":
            description: July 16, 2020, build 01
            dependencies:
                - openmpi/3.1.3
                - intel-python/2019u5:python3

The file can be checked for proper syntax:

$ vpkg_check "«vpkg-file»"

If all is okay, then the virtualenv can be loaded in login sessions or in job scripts as:

$ vpkg_require «env-name»/«version»

which for this recipe is:

$ vpkg_require mcmc-env/20200716.01
Adding dependency `libfabric/1.6.1` to your environment
Adding dependency `openmpi/3.1.3` to your environment
Adding dependency `intel-python/2019u5:python3` to your environment
Adding package `mcmc-env/20200716.01` to your environment
$ which python3
/work/workgroup/sw/mcmc-env/20200716.01/bin/python3
$ which pip
/work/workgroup/sw/mcmc-env/20200716.01/bin/pip

Each new instance of the virtualenv will receive a unique version identifier prior to being created. Append a new versions dictionary to the package definition file:

mcmc-env:
    prefix: /work/workgroup/sw/mcmc-env
    description: emcee and pyKLIP python environments
    flags:
        - no-standard-paths
    default-version: "20200716.01"
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          success: 0
    versions:
        "20200716.01":
            description: July 16, 2020, build 01
            dependencies:
                - openmpi/3.1.3
                - intel-python/2019u5:python3
        "20200716.02":
            description: July 16, 2020, build 02
            dependencies:
                - openmpi/4.0.2
                - intel-python/2019u5:python3

where the dependencies will change if different versions of the Open MPI library or Intel distribution for Python were used when creating the virtualenv.

If the new version should be promoted to being the default (e.g. when vpkg_require mcmc-env/default is used) then alter the default-version value accordingly.


1)
The emcee package is pure Python, without any binary components that would require compiling and linking, so it is acceptable to let setuptools satisfy the dependency using pip. This is not true of all of the pyKLIP required packages, though.
  • technical/recipes/emcee-in-virtualenv.txt
  • Last modified: 2021-02-24 23:45
  • by frey