technical:whitepaper:r-runtime-blas-lapack

This is an old revision of the document!


R: Runtime-configuration BLAS/LAPACK

The R Project for Statistical Computing is used on our clusters by a wide variety of scientific disciplines. Though the breadth of applications is wide, many of them require the functionality of BLAS/LAPACK libraries. R provides its own baseline implementations that will build on any system; naturally, one cannot expect these BLAS/LAPACK libraries to be highly performant relative to implementations like:

  • Intel Math Kernel Library (MKL)
  • Automatically-Tuned Linear Algebra Software (ATLAS)

The build procedure for R allows the package to be configured for building against external BLAS/LAPACK libraries. Once the base R build has completed and the resulting software has been installed, additional R libraries can be configured and installed atop it. It has been noted in the past that:

  1. Producing N such builds of R that vary only in the choice of underlying BLAS/LAPACK:
    • can require on the order of N times the disk space of a single build
    • puts a greater burden on the sysadmin to maintain all N similarly-outfitted copies
  2. R only makes use of standardized BLAS/LAPACK APIs, so any standard BLAS/LAPACK library should be able to be chosen at runtime (not just build time)/

Others have published articles in the past detailing the substitution of the ATLAS library by doing the following to a basic R build (which was built with its bundled BLAS/LAPACK):

The basic idea is:

  • copy libatlas.so to R_PREFIX/lib64/R/lib
  • remove libRblas.so and libRlapack.so from R_PREFIX/lib64/R/lib
  • symlink libRblas.so and libRlapack.so to libatlas.so in R_PREFIX/lib64/R/lib

This copy of R is configured to use R_PREFIX/lib64/R/lib to resolve shared libraries, so when executing the R command, for example, the symlinks will lead the runtime linker to the ATLAS library when resolving BLAS/LAPACK functions.

This scheme requires two things:

  1. the user must have ownership of the R installation or sufficient privileges to alter the files
  2. the BLAS/LAPACK substitution will happen on time only (probably shortly after the library is built)

While the first condition is obvious, the second may not seem important, especially for a build of R being maintained by an arbitrary user in an arbitrary location on the filesystem. However, computational reproducibility would demand that any alteration to the underlying BLAS/LAPACK be present – or at least able to be restored – at any time. This is one reason why libatlas.so was copied into the build and symlinks were used: having other BLAS/LAPACK libraries present, the libRblas.so and libRlapack.so symlinks can be altered as necessary. The caveat, however, is that:

  • only a single choice of underlying BLAS/LAPACK can be active
  • the underlying BLAS/LAPACK can be changed only when that build of R is not being executed/used

A simple way to organize multiple underlying BLAS/LAPACK libraries in a single R installation is to create subdirectories for each variant:

Path Description
R_PREFIX/lib64/R/lib base directory where R looks for shared libraries by default
R_PREFIX/lib64/R/lib/libRblas.so symlink to chosen BLAS library (from one of the subdirectories herein)
R_PREFIX/lib64/R/lib/libRlapack.so symlink to chosen LAPACK library (from one of the subdirectories herein)
R_PREFIX/lib64/R/lib/atlas directory to hold libatlas.so
R_PREFIX/lib64/R/lib/rblas directory to hold the bundled libRblas.so and libRlapack.so produced by R build procedure
R_PREFIX/lib64/R/lib/mkl directory to hold MKL variants
R_PREFIX/lib64/R/lib/mkl/seq directory to hold sequential MKL variant
R_PREFIX/lib64/R/lib/mkl/thr directory to hold threaded MKL variant

The ATLAS library contains both BLAS and LAPACK APIs in a single shared library and both the libRblas.so and 'libRatlas.so symlinks are pointed to it. The Intel MKL contains both APIs, as well, but is modularized by the parallel nature of the runtime environment: sequential (non-threaded) or OpenMP (multithreaded). Our solution is to build a shim library linked to the appropriate Intel libraries. ==== Sequential MKL shim ==== A C source file containing a dummy function was created in R_PREFIX/lib64/R/lib/mkl/seq: <file C shim.c> int mkl_shim_dummy(void) { return 0; } </file> The shim library is then created thusly: <code bash> $ cd ${R_PREFIX}/lib64/R/lib/mkl/seq $ icc -shared -o libRblas.so -mkl=sequential shim.c $ ln -s libRblas.so libRlapack.so </code> ==== Threaded MKL shim ==== A C source file containing a dummy function was created in R_PREFIX/lib64/R/lib/mkl/thr'':

shim.c
int
mkl_shim_dummy(void)
{
	return 0;
}

Since our R build used the GNU C compiler, the threaded MKL variant only works if the shim library is built against the GNU OpenMP runtime. Using just "-mkl=parallel" links against the Intel OpenMP runtime which in testing yielded numerical issues (not actual crashes). The shim library is then created thusly:

$ cd ${R_PREFIX}/lib64/R/lib/mkl/thr
$ icc -shared -o libRblas.so shim.c -lmkl_gnu_thread -lmkl_core -lmkl_intel_lp64
$ ln -s libRblas.so libRlapack.so
  • technical/whitepaper/r-runtime-blas-lapack.1544460877.txt.gz
  • Last modified: 2018-12-10 11:54
  • by frey