
This is an old revision of the document!

White Papers

Some of the content in this area will be in PDF format and may need to be downloaded before being read.

The R statistical computing software can be built atop a variety of BLAS and LAPACK libraries – including its own internal Rblas and Rlapack libraries. Creating alternate builds of R that vary ONLY in the identity of the underlying BLAS/LAPACK implementation can consume extremely large amounts of disk space (and time!). The runtime-configurable R BLAS/LAPACK whitepaper documents the scheme used on our latest HPC cluster to make the choice of library a runtime configurable option.

The behavior of the Mills cluster's cutting-edge Interlagos processor is studied under multi-threaded and multi-process work loads. Influences of compiler and BLAS/LAPACK library choice are presented.

Download the PDF

The Nodes on the Mills cluster have 2 or 4 AMD Opteron 6200 series sockets. Each socket is organized as a multi-chip module package with two CPU dies, interconnected using a HyperTransport link. Each die is organized as 3 core pairs (Interlagos modules). Thus, to the OS, the socket appears as a 12 logical CPUs (12-core sockets). Resources such as memory and floating points unites are shared between the cores.

This technical tuning guide is intended for "systems admins, application end-users, and developers on a Linux platform who perform application development, code tuning, optimization, and initial system installation". The document describes resource sharing, and the effect on your applications.

Download the PDF from the AMD developer site

The SC1) High Performance Computing Challenge includes the benchmarks:

  1. HPL ­ measures the floating point rate of execution for solving a linear system of equations
  2. DGEMM ­ measures the floating point rate of execution of double precision real matrix­ matrix multiplication
  3. STREAM ­ measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for a simple vector kernel
  4. PTRANS (parallel matrix transpose) ­ exercises communications between pairs of processors. It is a useful test of the total communications capacity of the network.
  5. Random Access ­ measures the rate of integer random updates of memory (GUPS)
  6. FFT ­ measures the floating point rate of execution of double precision complex one­ dimensional Discrete Fourier Transform (DFT)
  7. Communication bandwidth and latency ­ measures latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).

Download the PDF

By default Matlab uses multiple computational threads for standard linear algebra calculations. Without the options -singleCompThread it will use libraries tuned to use the computational hardware. Examples are the sunperf library on Solaris (Strauss) and the MKL library on intel hardware including Mills.

To fully use the computational threads you must call the built in high level functions or data parallel constructs in Matlab. For example, it is easy to write loops to do a Matrix multiply, but it w

For Mills, the recommended libraries include OpenMPI, ACML, and FFTW. The AMD recommended compilers include Open64 and PGI. The following document from AMD includes instructions for installing these libraries, but this is not needed on Mills since they are already installed as VALET packages.

Download the PDF from the AMD developer site

The International Conference for High Performance Computing, Networking, Storage and Analysis
  • technical/whitepaper/start.1544457615.txt.gz
  • Last modified: 2018-12-10 11:00
  • by frey