====== R: Runtime-configuration BLAS/LAPACK ======
The R Project for Statistical Computing is used on our clusters by a wide variety of scientific disciplines. Though the breadth of applications is wide, many of them require the functionality of BLAS/LAPACK libraries. R provides its own baseline implementations that will build on any system; naturally, one cannot expect these BLAS/LAPACK libraries to be highly performant relative to implementations like:
* Intel Math Kernel Library (MKL)
* Automatically-Tuned Linear Algebra Software (ATLAS)
The build procedure for R allows the package to be configured for building against external BLAS/LAPACK libraries. Once the base R build has completed and the resulting software has been installed, additional R libraries can be configured and installed atop it. It has been noted in the past that:
- Producing //N// such builds of R that vary only in the choice of underlying BLAS/LAPACK:
* can require on the order of //N// times the disk space of a single build
* puts a greater burden on the sysadmin to maintain all //N// similarly-outfitted copies
- R only makes use of standardized BLAS/LAPACK APIs, so any standard BLAS/LAPACK library should be able to be chosen at runtime (not just build time).
===== Substituting an alternate library =====
Others have published articles in the past detailing the substitution of the ATLAS library by doing the following to a basic R build (which was built with its bundled BLAS/LAPACK):
* [[https://www.r-bloggers.com/r-r-with-atlas-r-with-openblas-and-revolution-r-open-which-is-fastest/]]
* [[https://stackoverflow.com/questions/29984141/does-installing-blas-atlas-mkl-openblas-will-speed-up-r-package-that-is-written]]
The basic idea is:
* copy ''libatlas.so'' to ''** R_HOME**/lib64/R/lib''
* remove ''libRblas.so'' and ''libRlapack.so'' from ''** R_HOME**/lib64/R/lib''
* symlink ''libRblas.so'' and ''libRlapack.so'' to ''libatlas.so'' in ''** R_HOME**/lib64/R/lib''
This copy of R is configured to use ''** R_HOME**/lib64/R/lib'' to resolve shared libraries, so when executing the ''R'' command, for example, the symlinks will lead the runtime linker to the ATLAS library when resolving BLAS/LAPACK functions.
This scheme requires two things:
- the user must have ownership of the R installation or sufficient privileges to alter the files
- the BLAS/LAPACK substitution will happen on time only (probably shortly after the library is built)
While the first condition is obvious, the second may not seem important, especially for a build of R being maintained by an arbitrary user in an arbitrary location on the filesystem. However, computational reproducibility would demand that any alteration to the underlying BLAS/LAPACK be present -- or at least able to be restored -- at any time. This is one reason why ''libatlas.so'' was copied into the build and symlinks were used: having other BLAS/LAPACK libraries present, the ''libRblas.so'' and ''libRlapack.so'' symlinks can be altered as necessary. The caveat, however, is that:
* only a single choice of underlying BLAS/LAPACK can be active
* the underlying BLAS/LAPACK can be changed only when that build of R is not being executed/used
A simple way to organize multiple underlying BLAS/LAPACK libraries in a single R installation is to create subdirectories for each variant:
^Path ^Description^
|''** R_HOME**/lib64/R/lib'' |base directory where R looks for shared libraries by default|
|''** R_HOME**/lib64/R/lib/libRblas.so'' |symlink to chosen BLAS library (from one of the subdirectories herein)|
|''** R_HOME**/lib64/R/lib/libRlapack.so'' |symlink to chosen LAPACK library (from one of the subdirectories herein)|
|''** R_HOME**/lib64/R/lib/atlas'' |directory to hold ''libatlas.so''|
|''** R_HOME**/lib64/R/lib/rblas'' |directory to hold the bundled ''libRblas.so'' and ''libRlapack.so'' produced by R build procedure|
|''** R_HOME**/lib64/R/lib/mkl'' |directory to hold MKL variants|
|''** R_HOME**/lib64/R/lib/mkl/seq'' |directory to hold sequential MKL variant|
|''** R_HOME**/lib64/R/lib/mkl/thr'' |directory to hold threaded MKL variant|
==== R BLAS/LAPACK ====
When we restructured the R ''lib64/R/lib'' directory, the bundled ''libRblas.so'' and ''libRlapack.so'' shared library files were moved to the ''rblas'' subdirectory. To configure R to use its bundled libraries:
$ cd ${R_HOME}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s rblas/libRblas.so .
$ ln -s rblas/libRlapack.so .
==== ATLAS ====
The ATLAS library contains both BLAS and LAPACK APIs in a single shared library. With ''libatlas.so'' copied into the ''** R_HOME**/lib64/R/lib/atlas'' subdirectory, we configure R to use ATLAS:
$ cd ${R_HOME}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s atlas/libRblas.so .
$ ln -s atlas/libRlapack.so .
==== Sequential MKL ====
A C source file containing a dummy function was created in ''** R_HOME**/lib64/R/lib/mkl/seq'':
int
mkl_shim_dummy(void)
{
return 0;
}
The shim library is then created thusly:
$ cd ${R_HOME}/lib64/R/lib/mkl/seq
$ icc -shared -o libRblas.so -mkl=sequential shim.c
$ ln -s libRblas.so libRlapack.so
To configure R to use the sequential MKL:
$ cd ${R_HOME}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s mkl/seq/libRblas.so .
$ ln -s mkl/seq/libRlapack.so .
==== Threaded MKL ====
A C source file containing a dummy function was created in ''** R_HOME**/lib64/R/lib/mkl/thr'':
int
mkl_shim_dummy(void)
{
return 0;
}
Since our R build used the GNU C compiler, the threaded MKL variant only works if the shim library is built against the GNU OpenMP runtime. Using just "-mkl=parallel" links against the Intel OpenMP runtime which in testing yielded numerical issues (not actual crashes). The shim library is then created thusly:
$ cd ${R_HOME}/lib64/R/lib/mkl/thr
$ icc -shared -o libRblas.so shim.c -lmkl_gnu_thread -lmkl_core -lmkl_intel_lp64
$ ln -s libRblas.so libRlapack.so
To configure R to use the threaded MKL:
$ cd ${R_HOME}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s mkl/thr/libRblas.so .
$ ln -s mkl/thr/libRlapack.so .
===== Runtime-configurable substitution =====
By stashing each BLAS/LAPACK variant in its own subdirectory, our copy of R is actually fairly close to being runtime-configurable with respect to choice of BLAS/LAPACK. Since all R commands will setup the environment to have the runtime linker check ''** R_HOME**/lib64/R/lib'' for shared libraries, the ''libRblas.so'' and ''libRlapack.so'' symlinks in that directory will always have priority over any other path we might add to ''LD_LIBRARY_PATH'' prior to issuing the ''R'' command, for example. However, if ''libRblas.so'' and ''libRlapack.so'' are **not** present in that directory, the runtime linker will be forced to consult other paths present in ''LD_LIBRARY_PATH''.
On our Caviness cluster we include no BLAS/LAPACK library symlinks in the base directory which R checks for shared libraries:
$ cd ${R_HOME}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
In our VALET package definition for R we configure four variants that differ by BLAS/LAPACK (as a feature tag)
$ vpkg_versions r
:
r The R Project for Statistical Computing
3.5 alias to r/3.5.1
* 3.5.1 R 3.5.1 with system compilers, ATLAS
3.5.1:mkl-seq R 3.5.1 with system compilers, MKL (sequential)
3.5.1:mkl-thr R 3.5.1 with system compilers, MKL (multithread)
3.5.1:rblas R 3.5.1 with system compilers, R reference BLAS/LAPACK
with ATLAS as the default (recommended) choice of underlying BLAS/LAPACK. Each variant of the 3.5.1 version uses the same installation prefix, but adds a unique BLAS/LAPACK subdirectory to ''LD_LIBRARY_PATH'' when added to the user's environment:
$ vpkg_require r/3.5.1
Adding package `r/3.5.1` to your environment
$ echo $LD_LIBRARY_PATH
/opt/shared/r/3.5.1/lib64/R/lib/atlas: ...
$ vpkg_rollback all
$ vpkg_require r/3.5.1:rblas
Adding package `r/3.5.1:rblas` to your environment
$ echo $LD_LIBRARY_PATH
/opt/shared/r/3.5.1/lib64/R/lib/rblas:
This works fine so long as you don't attempt to install any R modules that require the BLAS/LAPACK functionality. The R module-building environment defaults to using link search paths in the base directory (''** R_HOME**/lib64/R/lib'') and fails to find ''libRblas.so'' or ''libRlapack.so''. This is easily fixed by altering two lines in the R installation's standard make flags file at ''** R_HOME**/lib64/R/etc/Makeconf'':
BLAS_LIBS = -L"$(R_HOME)/lib$(R_ARCH)/$(R_BLAS_VARIANT)" -lRblas
and
LAPACK_LIBS = -L"$(R_HOME)/lib$(R_ARCH)/$(R_BLAS_VARIANT)" -lRlapack
The link search added here will default to the usual path if ''R_BLAS_VARIANT'' is not set in the user's environment. But in our VALET configuration of each of the variants, we set the appropriate relative path for ''R_BLAS_VARIANT'':
$ vpkg_require r/3.5.1:rblas
Adding package `r/3.5.1:rblas` to your environment
$ echo $R_BLAS_VARIANT
rblas
$ vpkg_rollback all
$ vpkg_require r/3.5.1:mkl-seq
Adding package `r/3.5.1:mkl-seq` to your environment
$ echo $R_BLAS_VARIANT
mkl/seq
We now have a runtime-configurable BLAS/LAPACK for this single installation of R, and any properly-packaged R modules should build fine against it.
===== VALET configuration =====
Here is the VALET configuration for our installation of R 3.5.1 with runtime-configurable BLAS/LAPACK:
r:
description: The R Project for Statistical Computing
url: http://www.r-project.org/
prefix: /opt/shared/r
default-version: "3.5.1"
versions:
"3.5":
alias-to: 3.5.1
"3.5.1":
description: R 3.5.1 with system compilers, ATLAS
actions:
- libdir: lib64/R/lib
- incdir: lib64/R/include
- libdir: lib64/R/lib/atlas
- variable: R_BLAS_VARIANT
operator: set
value: atlas
"3.5.1:rblas":
description: R 3.5.1 with system compilers, R reference BLAS/LAPACK
prefix: 3.5.1
actions:
- libdir: lib64/R/lib
- incdir: lib64/R/include
- libdir: lib64/R/lib/rblas
- variable: R_BLAS_VARIANT
operator: set
value: rblas
"3.5.1:mkl-seq":
description: R 3.5.1 with system compilers, MKL (sequential)
prefix: 3.5.1
actions:
- libdir: lib64/R/lib
- incdir: lib64/R/include
- libdir: lib64/R/lib/mkl/seq
- variable: R_BLAS_VARIANT
operator: set
value: mkl/seq
"3.5.1:mkl-thr":
description: R 3.5.1 with system compilers, MKL (multithread)
prefix: 3.5.1
actions:
- libdir: lib64/R/lib
- incdir: lib64/R/include
- libdir: lib64/R/lib/mkl/thr
- variable: R_BLAS_VARIANT
operator: set
value: mkl/thr