====== HPCC with intel compiler, MKL and base FFT ======
===== Make =====
Start by downloading and extracting the hpcc-1.4.3 directory:
curl -s http://icl.cs.utk.edu/projectsfiles/hpcc/download/hpcc-1.4.3.tar.gz | tar zx
The hpcc-1.4.3 directory will have all the files you need to run the benchmark. Our job
is to modify the setup for intel, and mkl on farber which uses VALET.
Copy the make file ''hpl/setup/Make.Linux_ATHLON_FBLAS'' to ''hpl/Make.intel-mkl''
- Comment lines beginning in ''MP'' or ''LA''
- Change ''/usr/bin/gcc'' to ''mpicc''
- Change ''/usr/bin/g77'' to ''mpif77''
- Change ''CCFLAGS'' to ''-mkl -O3 -fno-alias -DHPCC_FFT_235''
- Change ''LINGFLAGS'' to ''-mkl -nofor-main''
The Valet commands are
vpkg_devrequire intel
vpkg_devrequire openmpi/1.8.2-intel64
Exported variables (to set values for commented LAinc and LAlib)
export LAinc="$CPPFLAGS"
export LAlib="$LDFLAGS -nofor-main"
Make command with 4 threads
make -j 4 arch=intel-mkl
==== runs: N = 30000 ====
package `intel/2015.0.090`
package `openmpi/1.8.2-intel64`
N = 30000, NB = 200, P = 5 Q = 8
These runs need 40 processes (5 per row and 8 per column.) The same number of processes are run with 40 slots.
^ WEB NAME ^ VALUE ^ UNITS ^
| G-HPL | 0.6201 | TeraFlops/Sec |
| G-PTRANS | 0.0127 | TeraBytes/Sec |
| G-RandomAccess | 0.0789 | GigaUpdates/Sec |
| G-FFT | 0.0222 | TeraFlops/Sec |
| EP-STREAM Sys | 0.1638 | TeraBytes/Sec |
| EP-STREAM Triad | 4.0951 | GigaBytes/Sec |
| EP-DGEMM | 14.7898 | GigaFlops/Sec |
| RandomRing Bandwidth | 0.5619 | GigaBytes/Sec |
| RandomRing Latency | 2.1133 | micro-seconds |
qacct values:
ru_wallclock 130.716
ru_utime 5114.084
ru_stime 39.425
maxvmem 25.174G
==== runs: N = 72000 ====
package `acml/5.3.0-open64-fma4`
package `open64/4.5`
package `openmpi/1.4.4-open64` or `openmpi/1.6.1-open64`
N = 72000, NB = 100, P = 12, Q = 16
nproc = 2x192 (384 slots with 192 MPI workers bound to a bulldozer core pair)
Two runs mostly differ by the use of Qlogic PSM endpoints
^ Result ^ ''^''PSM (v1.4.4) ^ PSM (v1.6.1) ^
| HPL_Tflops | 1.68496 | 2.08056 |
| StarDGEMM_Gflops | 14.6933 | 14.8339 |
| SingleDGEMM_Gflops | 15.642 | 15.536 |
| PTRANS_GBs | 9.25899 | 18.4793 |
| StarFFT_Gflops | 1.19982 | 1.25452 |
| StarSTREAM_Triad | 3.62601 | 3.65631 |
| SingleFFT_Gflops | 1.44111 | 1.44416 |
| MPIFFT_Gflops | 7.67835 | 77.603 |
| RandomlyOrderedRingLatency_usec | 65.8478 | 2.44898 |