technical:whitepaper:r-runtime-blas-lapack

This is an old revision of the document!


R: Runtime-configuration BLAS/LAPACK

The R Project for Statistical Computing is used on our clusters by a wide variety of scientific disciplines. Though the breadth of applications is wide, many of them require the functionality of BLAS/LAPACK libraries. R provides its own baseline implementations that will build on any system; naturally, one cannot expect these BLAS/LAPACK libraries to be highly performant relative to implementations like:

  • Intel Math Kernel Library (MKL)
  • Automatically-Tuned Linear Algebra Software (ATLAS)

The build procedure for R allows the package to be configured for building against external BLAS/LAPACK libraries. Once the base R build has completed and the resulting software has been installed, additional R libraries can be configured and installed atop it. It has been noted in the past that:

  1. Producing N such builds of R that vary only in the choice of underlying BLAS/LAPACK:
    • can require on the order of N times the disk space of a single build
    • puts a greater burden on the sysadmin to maintain all N similarly-outfitted copies
  2. R only makes use of standardized BLAS/LAPACK APIs, so any standard BLAS/LAPACK library should be able to be chosen at runtime (not just build time)/

Others have published articles in the past detailing the substitution of the ATLAS library by doing the following to a basic R build (which was built with its bundled BLAS/LAPACK):

The basic idea is:

  • copy libatlas.so to R_PREFIX/lib64/R/lib
  • remove libRblas.so and libRlapack.so from R_PREFIX/lib64/R/lib
  • symlink libRblas.so and libRlapack.so to libatlas.so in R_PREFIX/lib64/R/lib

This copy of R is configured to use R_PREFIX/lib64/R/lib to resolve shared libraries, so when executing the R command, for example, the symlinks will lead the runtime linker to the ATLAS library when resolving BLAS/LAPACK functions.

This scheme requires two things:

  1. the user must have ownership of the R installation or sufficient privileges to alter the files
  2. the BLAS/LAPACK substitution will happen on time only (probably shortly after the library is built)

While the first condition is obvious, the second may not seem important, especially for a build of R being maintained by an arbitrary user in an arbitrary location on the filesystem. However, computational reproducibility would demand that any alteration to the underlying BLAS/LAPACK be present – or at least able to be restored – at any time. This is one reason why libatlas.so was copied into the build and symlinks were used: having other BLAS/LAPACK libraries present, the libRblas.so and libRlapack.so symlinks can be altered as necessary. The caveat, however, is that:

  • only a single choice of underlying BLAS/LAPACK can be active
  • the underlying BLAS/LAPACK can be changed only when that build of R is not being executed/used

A simple way to organize multiple underlying BLAS/LAPACK libraries in a single R installation is to create subdirectories for each variant:

Path Description
R_PREFIX/lib64/R/lib base directory where R looks for shared libraries by default
R_PREFIX/lib64/R/lib/libRblas.so symlink to chosen BLAS library (from one of the subdirectories herein)
R_PREFIX/lib64/R/lib/libRlapack.so symlink to chosen LAPACK library (from one of the subdirectories herein)
R_PREFIX/lib64/R/lib/atlas directory to hold libatlas.so
R_PREFIX/lib64/R/lib/rblas directory to hold the bundled libRblas.so and libRlapack.so produced by R build procedure
R_PREFIX/lib64/R/lib/mkl directory to hold MKL variants
R_PREFIX/lib64/R/lib/mkl/seq directory to hold sequential MKL variant
R_PREFIX/lib64/R/lib/mkl/thr directory to hold threaded MKL variant

When we restructured the R lib64/R/lib directory, the bundled libRblas.so and libRlapack.so shared library files were moved to the rblas subdirectory. To configure R to use its bundled libraries:

$ cd ${R_PREFIX}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s rblas/libRblas.so .
$ ln -s rblas/libRlapack.so .

The ATLAS library contains both BLAS and LAPACK APIs in a single shared library. With libatlas.so copied into the R_PREFIX/lib64/R/lib/atlas subdirectory, we configure R to use ATLAS:

$ cd ${R_PREFIX}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s atlas/libRblas.so .
$ ln -s atlas/libRlapack.so .

A C source file containing a dummy function was created in R_PREFIX/lib64/R/lib/mkl/seq:

shim.c
int
mkl_shim_dummy(void)
{
	return 0;
}

The shim library is then created thusly:

$ cd ${R_PREFIX}/lib64/R/lib/mkl/seq
$ icc -shared -o libRblas.so -mkl=sequential shim.c
$ ln -s libRblas.so libRlapack.so

To configure R to use the sequential MKL:

$ cd ${R_PREFIX}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s mkl/seq/libRblas.so .
$ ln -s mkl/seq/libRlapack.so .

A C source file containing a dummy function was created in R_PREFIX/lib64/R/lib/mkl/thr:

shim.c
int
mkl_shim_dummy(void)
{
	return 0;
}

Since our R build used the GNU C compiler, the threaded MKL variant only works if the shim library is built against the GNU OpenMP runtime. Using just "-mkl=parallel" links against the Intel OpenMP runtime which in testing yielded numerical issues (not actual crashes). The shim library is then created thusly:

$ cd ${R_PREFIX}/lib64/R/lib/mkl/thr
$ icc -shared -o libRblas.so shim.c -lmkl_gnu_thread -lmkl_core -lmkl_intel_lp64
$ ln -s libRblas.so libRlapack.so

To configure R to use the threaded MKL:

$ cd ${R_PREFIX}/lib64/R/lib
$ rm -f libR{blas,lapack}.so
$ ln -s mkl/thr/libRblas.so .
$ ln -s mkl/thr/libRlapack.so .

By stashing each BLAS/LAPACK variant in its own subdirectory, our copy of R is actually fairly close to being runtime-configurable with respect to choice of BLAS/LAPACK. Since all R commands will setup the environment to have the runtime linker check R_PREFIX/lib64/R/lib for shared libraries, the libRblas.so and libRlapack.so symlinks in that directory will always have priority over any other path we might add to LD_LIBRARY_PATH prior to issuing the R command, for example. However, if libRblas.so and libRlapack.so are not present in that directory, the runtime linker will be forced to consult other paths present in LD_LIBRARY_PATH.

On our Caviness cluster we include no BLAS/LAPACK library symlinks in the base directory which R checks for shared libraries:

$ cd ${R_PREFIX}/lib64/R/lib
$ rm -f libR{blas,lapack}.so

In our VALET package definition for R we configure four variants that differ by BLAS/LAPACK (as a feature tag)

$ vpkg_versions r
   :
r                The R Project for Statistical Computing
  3.5            alias to r/3.5.1
* 3.5.1          R 3.5.1 with system compilers, ATLAS
  3.5.1:mkl-seq  R 3.5.1 with system compilers, MKL (sequential)
  3.5.1:mkl-thr  R 3.5.1 with system compilers, MKL (multithread)
  3.5.1:rblas    R 3.5.1 with system compilers, R reference BLAS/LAPACK

with ATLAS as the default (recommended) choice of underlying BLAS/LAPACK. Each variant of the 3.5.1 version uses the same installation prefix, but adds a unique BLAS/LAPACK subdirectory to LD_LIBRARY_PATH when added to the user's environment:

$ vpkg_require r/3.5.1
Adding package `r/3.5.1` to your environment
$ echo $LD_LIBRARY_PATH
/opt/shared/r/3.5.1/lib64/R/lib/atlas: ...
$ vpkg_rollback all
$ vpkg_require r/3.5.1:rblas
Adding package `r/3.5.1:rblas` to your environment
$ echo $LD_LIBRARY_PATH
/opt/shared/r/3.5.1/lib64/R/lib/rblas
  • technical/whitepaper/r-runtime-blas-lapack.1544462834.txt.gz
  • Last modified: 2018-12-10 12:27
  • by frey