This is an old revision of the document!
White Papers
Some of the content in this area will be in PDF format and may need to be downloaded before being read.
Mills: threading performance study
The behavior of the Mills cluster's cutting-edge Interlagos processor is studied under multi-threaded and multi-process work loads. Influences of compiler and BLAS/LAPACK library choice are presented.
Mills: AMD Opteron 6200 Unix Tuning Guide
The Nodes on the Mills cluster have 2 or 4 AMD Opteron 6200 series sockets. Each socket is organized as a multi-chip module package with two CPU dies, interconnected using a HyperTransport link. Each die is organized as 3 core pairs (Interlagos modules). Thus, to the OS, the socket appears as a 12 logical CPUs (12-core sockets). Resources such as memory and floating points unites are shared between the cores.
This technical tuning guide is intended for "systems admins, application end-users, and developers on a Linux platform who perform application development, code tuning, optimization, and initial system installation". The document describes resource sharing, and the effect on your applications.
HPC Challenge Awards Competition at SC Conference
The SC1) High Performance Computing Challenge includes the benchmarks:
- HPL measures the floating point rate of execution for solving a linear system of equations
- DGEMM measures the floating point rate of execution of double precision real matrix matrix multiplication
- STREAM measures sustainable memory bandwidth (in GB/s) and the corresponding computation rate for a simple vector kernel
- PTRANS (parallel matrix transpose) exercises communications between pairs of processors. It is a useful test of the total communications capacity of the network.
- Random Access measures the rate of integer random updates of memory (GUPS)
- FFT measures the floating point rate of execution of double precision complex one dimensional Discrete Fourier Transform (DFT)
- Communication bandwidth and latency measures latency and bandwidth of a number of simultaneous communication patterns; based on b_eff (effective bandwidth benchmark).
Matlab: Computational threads on a shared cluster
By default Matlab uses multiple computational threads for standard linear algebra calculations. Without the options -singleCompThread
it will use libraries tuned to use the computational hardware. Examples are the sunperf
library on Solaris (Strauss) and the MKL library on intel hardware including Mills.
To fully use the computational threads you must call the built in high level functions or data parallel constructs in Matlab. For example, it is easy to write loops to do a Matrix multiply, but it w
Mills: Using ACML In High Performance Computing Challenge
For Mills, the recommended libraries include OpenMPI, ACML, and FFTW. The AMD recommended compilers include Open64 and PGI. The following document from AMD includes instructions for installing these libraries, but this is not needed on Mills since they are already installed as VALET packages.