HPCC with open64 compiler, ACML and base FFT
Make
Here are som modifications on Mills based on recommendations from Using ACML (AMD Core Math Library) In High Performance Computing Challenge (HPCC)
Changes to the make file hpl/setup/Make.Linux_ATHLON_FBLAS
copied to hpl/Make.open64-acml
- Comment lines beginning in
MP
orLA
- Change
/usr/bin/gcc
tompicc
- Change
/usr/bin/g77
tompif77
- Append
-DHPCC_FFT_235
toCCFLAGS
The Valet commands are
vpkg_devrequire acml/5.2.0-open64-fma4 vpkg_devrequire openmpi/1.6.1-open64
Exported variables (to set values for commented LAinc and LAlib)
export LAinc="$CPPFLAGS" export LAlib="$LDFLAGS -lacml"
Make command with 4 threads
make -j 4 arch=open64-acml
runs: N = 30000
package `acml/5.2.0-open64-fma4` package `open64/4.5` package `openmpi/1.4.4-open64`
N = 30000, NB = 100, P = 6, Q=8
These runs need 48 processes (6 per row and 8 per column.) The same number of processes are run with 48 or 96 slots.
Options | Grid Engine | MPI flags |
---|---|---|
NCPU=1 | -pe openmpi 48 | –bind-to-core |
NCPU=2 | -pe openmpi 96 | –bind-to-core –bycore –cpus-per-proc 2 -np 48 |
HPCC benchmark results for two runs:
result | NCPU=1 | NCPU=2 |
---|---|---|
HPL_Tflops | 0.0769491 | 1.54221 |
StarDGEMM_Gflops | 1.93686 | 14.6954 |
SingleDGEMM_Gflops | 11.5042 | 15.6919 |
MPIRandomAccess_LCG_GUPs | 0.0195047 | 0.00352421 |
MPIRandomAccess_GUPs | 0.0194593 | 0.00410853 |
StarRandomAccess_LCG_GUPs | 0.0113424 | 0.0302748 |
SingleRandomAccess_LCG_GUPs | 0.0448261 | 0.0568664 |
StarRandomAccess_GUPs | 0.0113898 | 0.0288637 |
SingleRandomAccess_GUPs | 0.0521811 | 0.053262 |
StarFFT_Gflops | 0.557555 | 1.14746 |
SingleFFT_Gflops | 1.2178 | 1.45413 |
MPIFFT_Gflops | 5.31624 | 34.3552 |
runs: N = 72000
package `acml/5.3.0-open64-fma4` package `open64/4.5` package `openmpi/1.4.4-open64` or `openmpi/1.6.1-open64`
N = 72000, NB = 100, P = 12, Q = 16
nproc = 2x192 (384 slots with 192 MPI workers bound to a bulldozer core pair)
Two runs mostly differ by the use of Qlogic PSM endpoints
Result | ^ PSM (v1.4.4) | PSM (v1.6.1) |
---|---|---|
HPL_Tflops | 1.68496 | 2.08056 |
StarDGEMM_Gflops | 14.6933 | 14.8339 |
SingleDGEMM_Gflops | 15.642 | 15.536 |
PTRANS_GBs | 9.25899 | 18.4793 |
StarFFT_Gflops | 1.19982 | 1.25452 |
StarSTREAM_Triad | 3.62601 | 3.65631 |
SingleFFT_Gflops | 1.44111 | 1.44416 |
MPIFFT_Gflops | 7.67835 | 77.603 |
RandomlyOrderedRingLatency_usec | 65.8478 | 2.44898 |
more
N = 72000, NB = 100, P = 12, Q=16, NP=384 package `acml/5.2.0-open64-fma4` to your environment package `open64/4.5` to your environment package `openmpi/1.4.4-open64` to your environment package `acml/5.3.0-open64-fma4` to your environment package `open64/4.5` to your environment package `openmpi/1.6.1-open64` to your environment
Result | 139765 | 145105 |
---|---|---|
HPL_Tflops | 1.54221 | 0.364243 |
StarDGEMM_Gflops | 14.6954 | 13.6194 |
SingleDGEMM_Gflops | 15.6919 | 15.453 |
PTRANS_GBs | 1.14913 | 1.07982 |
MPIRandomAccess_GUPs | 0.00410853 | 0.00679052 |
StarSTREAM_Triad | 3.39698 | 2.83863 |
StarFFT_Gflops | 1.14746 | 0.737805 |
SingleFFT_Gflops | 1.45413 | 1.3756 |
MPIFFT_Gflops | 34.3552 | 32.3555 |
RandomlyOrderedRingLatency_usec | 77.9332 | 76.9595 |