technical:recipes:emcee-in-virtualenv

This is an old revision of the document!


Python Virtualenv: emcee and pyKLIP

Some portions of this recipe are specific to the IT-RCI HPC clusters (VALET-based environment setup) but otherwise should work on any system.

Choose a directory in which to install the one (or more) virtual environments (virtualenvs). The string «env-name» will denote the name chosen for the virtualenvs – this recipe will use mcmc-env. Do not locate this directory under the /lustre/scratch file system; typically a directory under the workgroup's storage is appropriate:

  • If setting-up this virtualenv for multiple users in the workgroup, choose ${WORKDIR}/sw/«env-name» as the base directory.
  • If this virtualenv is solely for personal use, choose ${WORKDIR}/users/<username>/sw/«env-name», for example.

Note that these examples assume a standard workgroup storage layout with group-writable sw and users directories at the top level. Create the directory:

$ mkdir "${WORKDIR}/sw/mcmc-env"

The virtualenv will be created using Intel's distribution for Python, but Anaconda would work just as well. One of the cluster's MPI libraries also needs to be loaded into the environment:

$ vpkg_require intel-python/2019u5:python3 openmpi/3.1.3

Make note of these two package identifiers: they will later be dependencies cited in the VALET package definition for this virtualenv.

In order to maintain one or more distinct instances of the mcmc-env virtualenv, each must be given a distinct version. While this could be any unique sequence of characters, it's usually best to adhere to a semantic versioning scheme. For Python virtualenvs the current date with an extra numerical component (in case multiple instances are created on the same day) is useful: YYYYMMDD.##, where the four-digit year and zero-padded month and day number are used. The trailing ## is also zero-padded. For example, 20200716.01 since today is July 16, 2020, and it's the first virtualenv created today.

Use the conda command to create the virtualenv and populate it with the most basic dependencies:

$ conda create --prefix="${WORKDIR}/sw/mcmc-env/20200716.01" --copy pip python=3 numpy scipy matplotlib astropy

The basic dependencies do not require an extensive list of other packages, most notably the mpi4py that must be built against the cluster's MPI library. If the virtualenv build succeeds (and why shouldn't it!) the new virtualenv can be activated and the location of the pip command inside it can be verified:

$ source activate "${WORKDIR}/sw/mcmc-env/20200716.01"
$ which pip
/work/workgroup/sw/mcmc-env/20200716.01/bin/pip

For the sake of the emcee and pyKLIP packages it is recommended that a few components be updated or installed first:

$ pip install -U setuptools setuptools_scm pep517 pytest

The pytest package is a soft requirement of the debrisdiskfm package. Next, install from source the mpi4py package as well as a few others:

$ pip install --no-binary :all: mpi4py corner image_registration git+https://github.com/seawander/DebrisDiskFM.git#egg=Package schwimmbad

The image_registration package is another soft requirement of the debrisdiskfm package. This may take some time since binary parts of the packages must be compiled and linked (e.g. the shared library containing hooks into the cluster MPI library for mpi4py) before being copied into the virtualenv.

The pyKLIP package must be installed from a cloned github source tree. The emcee package is a requirement of pyKLIP and will be installed automatically via pip1).

Rather than using the head of the git source tree – corresponding to ongoing revisions/bug-fixes and not a specific release of the package – the most recent release tag will be used:

$ git clone https://bitbucket.org/pyKLIP/pyklip.git pyklip-src
$ cd pyklip-src
$ git checkout v2.1

Installation into the virtualenv proceeds simply:

$ python setup.py install

Followed by removal of the cloned source:

$ cd ..
$ rm -rf pyklip-src

At this point the packages present in the completed virtualenv can be listed:

$ pip list
Package            Version
------------------ -------------------
astropy            4.0.1.post1
certifi            2020.6.20
corner             2.1.0
cycler             0.10.0
debrisdiskfm       0.1.0
emcee              3.0.2
FITS-tools         0.2
gqueue             3.0.3
image-registration 0.2.4
kiwisolver         1.2.0
matplotlib         3.2.2
mkl-fft            1.1.0
mkl-random         1.1.1
mkl-service        2.3.0
mpi4py             3.0.3
numpy              1.18.5
pep517             0.8.2
pip                20.1.1
pyklip             2.1
pyparsing          2.4.7
python-dateutil    2.8.1
schwimmbad         0.3.1
scipy              1.5.0
setuptools         49.2.0.post20200714
setuptools-scm     4.1.2
sip                4.19.13
six                1.15.0
toml               0.10.1
tornado            6.0.4
wheel              0.34.2

Version numbers may differ as newer versions of the component packages are released, but at the very least the

  • emcee
  • debrisdiskfm
  • schwimmbad
  • astropy
  • matplotlib
  • numpy
  • pyklip

packages should all be present in the list.

To simplify the addition of this virtualenv to the runtime environment, a VALET package definition file should be created. If the virtualenv was installed for use by one or more members of the workgroup, the file should be created as ${WORKDIR}/sw/valet/«env-name».vpkg_yaml. If the virtualenv is for personal use, create the file as ~/.valet/«env-name».vpkg_yaml. Naturally, you must ensure that the directory (${WORKDIR}/sw/valet or ~/.valet, respectively) exists prior to creating the package definition file.

The package definition file uses YAML. Information to know before proceeding:

ItemDescriptionValue in this example
«env-name»The name of the virtualenv createdmcmc-env
«prefix»The directory in which the «env-name» directory to hold the virtualenv was created/work/workgroup/sw
«version»The version identifier chosen for this instance of the virtualenv20200716.01

The package definition file should follow this format:

«env-name»:
    prefix: «prefix»/«env-name»
    description: emcee and pyKLIP python environments
    default-version: "«version»"
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          success: 0
    versions:
        "«version»":
            description: July 16, 2020, build 01
            dependencies:
                - openmpi/3.1.3
                - intel-python/2019u5:python3

For the virtualenv created in the course of this recipe, the resulting file (/work/workgroup/sw/valet/mcmc-env.vpkg_yaml) would look like:

mcmc-env:
    prefix: /work/workgroup/sw/mcmc-env
    description: emcee and pyKLIP python environments
    default-version: "20200716.01"
    actions:
        - action: source
          script:
              sh: anaconda-activate.sh
          success: 0
    versions:
        "20200716.01":
            description: July 16, 2020, build 01
            dependencies:
                - openmpi/3.1.3
                - intel-python/2019u5:python3

1)
The emcee package is pure Python, without any binary components that would require compiling and linking, so it is acceptable to let setuptools satisfy the dependency using pip. This is not true of all of the pyKLIP required packages, though.
  • technical/recipes/emcee-in-virtualenv.1594909522.txt.gz
  • Last modified: 2020-07-16 10:25
  • by frey