====== Python Virtualenv: emcee and pyKLIP ====== Some portions of this recipe are specific to the IT-RCI HPC clusters (VALET-based environment setup) but otherwise should work on any system. ===== Preparations ===== Choose a directory in which to install the one (or more) virtual environments (virtualenvs). The string ''«env-name»'' will denote the name chosen for the virtualenvs -- this recipe will use ''mcmc-env''. Do not locate this directory under the ''/lustre/scratch'' file system; typically a directory under the workgroup's storage is appropriate: * If setting-up this virtualenv for multiple users in the workgroup, choose ''${WORKDIR}/sw/«env-name»'' as the base directory. * If this virtualenv is solely for personal use, choose ''${WORKDIR}/users//sw/«env-name»'', for example. Note that these examples assume a standard workgroup storage layout with group-writable ''sw'' and ''users'' directories at the top level. Create the directory: $ mkdir "${WORKDIR}/sw/mcmc-env" The virtualenv will be created using Intel's distribution for Python, but Anaconda would work just as well. One of the cluster's MPI libraries also needs to be loaded into the environment: $ vpkg_require intel-python/2019u5:python3 openmpi/3.1.3 Make note of these two package identifiers: they will later be dependencies cited in the VALET package definition for this virtualenv. ===== Versioning ===== In order to maintain one or more distinct instances of the ''mcmc-env'' virtualenv, each must be given a distinct version. While this could be any unique sequence of characters, it's usually best to adhere to a semantic versioning scheme. For Python virtualenvs the current date with an extra numerical component (in case multiple instances are created on the same day) is useful: ''YYYYMMDD.##'', where the four-digit year and zero-padded month and day number are used. The trailing ''##'' is also zero-padded. For example, ''20200716.01'' since today is July 16, 2020, and it's the first virtualenv created today. ===== Create the Virtualenv ===== Use the ''conda'' command to create the virtualenv and populate it with the most basic dependencies: $ conda create --prefix="${WORKDIR}/sw/mcmc-env/20200716.01" --copy pip python=3 numpy scipy matplotlib astropy The basic dependencies do not require an extensive list of other packages, most notably the ''mpi4py'' that must be built against the cluster's MPI library. If the virtualenv build succeeds (and why shouldn't it!) the new virtualenv can be activated and the location of the ''pip'' command inside it can be verified: $ source activate "${WORKDIR}/sw/mcmc-env/20200716.01" $ which pip /work/workgroup/sw/mcmc-env/20200716.01/bin/pip ===== Update and Install Other Packages ===== For the sake of the ''emcee'' and ''pyKLIP'' packages it is recommended that a few components be updated or installed first: $ pip install -U setuptools setuptools_scm pep517 pytest The ''pytest'' package is a soft requirement of the ''debrisdiskfm'' package. Next, install //from source// the ''mpi4py'' package as well as a few others: $ pip install --no-binary :all: mpi4py corner image_registration git+https://github.com/seawander/DebrisDiskFM.git#egg=Package schwimmbad The ''image_registration'' package is another soft requirement of the ''debrisdiskfm'' package. This may take some time since binary parts of the packages must be compiled and linked (e.g. the shared library containing hooks into the cluster MPI library for ''mpi4py'') before being copied into the virtualenv. ===== Install pyKLIP ===== The ''pyKLIP'' package must be installed from a cloned github source tree. The ''emcee'' package is a requirement of ''pyKLIP'' and will be installed automatically via ''pip''((The ''emcee'' package is pure Python, without any binary components that would require compiling and linking, so it is acceptable to let ''setuptools'' satisfy the dependency using ''pip''. This is not true of all of the ''pyKLIP'' required packages, though.)). Rather than using the head of the git source tree -- corresponding to ongoing revisions/bug-fixes and not a specific release of the package -- the most recent release tag will be used: $ git clone https://bitbucket.org/pyKLIP/pyklip.git pyklip-src $ cd pyklip-src $ git checkout v2.1 Installation into the virtualenv proceeds simply: $ python setup.py install Followed by removal of the cloned source: $ cd .. $ rm -rf pyklip-src ===== Completed virtualenv ===== At this point the packages present in the completed virtualenv can be listed: $ pip list Package Version ------------------ ------------------- astropy 4.0.1.post1 certifi 2020.6.20 corner 2.1.0 cycler 0.10.0 debrisdiskfm 0.1.0 emcee 3.0.2 FITS-tools 0.2 gqueue 3.0.3 image-registration 0.2.4 kiwisolver 1.2.0 matplotlib 3.2.2 mkl-fft 1.1.0 mkl-random 1.1.1 mkl-service 2.3.0 mpi4py 3.0.3 numpy 1.18.5 pep517 0.8.2 pip 20.1.1 pyklip 2.1 pyparsing 2.4.7 python-dateutil 2.8.1 schwimmbad 0.3.1 scipy 1.5.0 setuptools 49.2.0.post20200714 setuptools-scm 4.1.2 sip 4.19.13 six 1.15.0 toml 0.10.1 tornado 6.0.4 wheel 0.34.2 Version numbers may differ as newer versions of the component packages are released, but at the very least the * emcee * debrisdiskfm * schwimmbad * astropy * matplotlib * numpy * pyklip packages should all be present in the list. Return the shell environment to its original state before proceeding: $ vpkg_rollback all ===== VALET Package Definition ===== To simplify the addition of this virtualenv to the runtime environment, a VALET package definition file should be created. If the virtualenv was installed for use by one or more members of the workgroup, the file should be created as ''${WORKDIR}/sw/valet/«env-name».vpkg_yaml''. If the virtualenv is for personal use, create the file as ''~/.valet/«env-name».vpkg_yaml''. Naturally, you must ensure that the directory (''${WORKDIR}/sw/valet'' or ''~/.valet'', respectively) exists prior to creating the package definition file. The package definition file uses the [[https://yaml.org|YAML]] format. Information to know before proceeding: ^Item^Description^Value in this example^ |''«env-name»''|The name of the virtualenv created|''mcmc-env''| |''«prefix»''|The directory in which the ''«env-name»'' directory to hold the virtualenv was created|''/work/workgroup/sw''| |''«version»''|The version identifier chosen for this instance of the virtualenv|''20200716.01''| |''«vpkg-file»''|The path to the virtualenv package definition file|''/work/workgroup/sw/valet/mcmc-env.vpkg_yaml''| The package definition file should follow this format: «env-name»: prefix: «prefix»/«env-name» description: emcee and pyKLIP python environments flags: - no-standard-paths default-version: "«version»" actions: - action: source script: sh: anaconda-activate.sh success: 0 versions: "«version»": description: July 16, 2020, build 01 dependencies: - openmpi/3.1.3 - intel-python/2019u5:python3 For the virtualenv created in the course of this recipe, the resulting file ''«vpkg-file»'' would look like: mcmc-env: prefix: /work/workgroup/sw/mcmc-env description: emcee and pyKLIP python environments flags: - no-standard-paths default-version: "20200716.01" actions: - action: source script: sh: anaconda-activate.sh success: 0 versions: "20200716.01": description: July 16, 2020, build 01 dependencies: - openmpi/3.1.3 - intel-python/2019u5:python3 The file can be checked for proper syntax: $ vpkg_check "«vpkg-file»" If all is okay, then the virtualenv can be loaded in login sessions or in job scripts as: $ vpkg_require «env-name»/«version» which for this recipe is: $ vpkg_require mcmc-env/20200716.01 Adding dependency `libfabric/1.6.1` to your environment Adding dependency `openmpi/3.1.3` to your environment Adding dependency `intel-python/2019u5:python3` to your environment Adding package `mcmc-env/20200716.01` to your environment $ which python3 /work/workgroup/sw/mcmc-env/20200716.01/bin/python3 $ which pip /work/workgroup/sw/mcmc-env/20200716.01/bin/pip ==== Adding New Instances of the virtualenv ==== Each new instance of the virtualenv will receive a unique version identifier prior to being created. Append a new //versions// dictionary to the package definition file: mcmc-env: prefix: /work/workgroup/sw/mcmc-env description: emcee and pyKLIP python environments flags: - no-standard-paths default-version: "20200716.01" actions: - action: source script: sh: anaconda-activate.sh success: 0 versions: "20200716.01": description: July 16, 2020, build 01 dependencies: - openmpi/3.1.3 - intel-python/2019u5:python3 "20200716.02": description: July 16, 2020, build 02 dependencies: - openmpi/4.0.2 - intel-python/2019u5:python3 where the dependencies will change if different versions of the Open MPI library or Intel distribution for Python were used when creating the virtualenv. If the new version should be promoted to being the default (e.g. when ''vpkg_require mcmc-env/default'' is used) then alter the ''default-version'' value accordingly.