====== HPCC with intel compiler, MKL and base FFT ====== ===== Make ===== Start by downloading and extracting the hpcc-1.4.3 directory: curl -s http://icl.cs.utk.edu/projectsfiles/hpcc/download/hpcc-1.4.3.tar.gz | tar zx The hpcc-1.4.3 directory will have all the files you need to run the benchmark. Our job is to modify the setup for intel, and mkl on farber which uses VALET. Copy the make file ''hpl/setup/Make.Linux_ATHLON_FBLAS'' to ''hpl/Make.intel-mkl'' - Comment lines beginning in ''MP'' or ''LA'' - Change ''/usr/bin/gcc'' to ''mpicc'' - Change ''/usr/bin/g77'' to ''mpif77'' - Change ''CCFLAGS'' to ''-mkl -O3 -fno-alias -DHPCC_FFT_235'' - Change ''LINGFLAGS'' to ''-mkl -nofor-main'' The Valet commands are vpkg_devrequire intel vpkg_devrequire openmpi/1.8.2-intel64 Exported variables (to set values for commented LAinc and LAlib) export LAinc="$CPPFLAGS" export LAlib="$LDFLAGS -nofor-main" Make command with 4 threads make -j 4 arch=intel-mkl ==== runs: N = 30000 ==== package `intel/2015.0.090` package `openmpi/1.8.2-intel64` N = 30000, NB = 200, P = 5 Q = 8 These runs need 40 processes (5 per row and 8 per column.) The same number of processes are run with 40 slots. ^ WEB NAME ^ VALUE ^ UNITS ^ | G-HPL | 0.6201 | TeraFlops/Sec | | G-PTRANS | 0.0127 | TeraBytes/Sec | | G-RandomAccess | 0.0789 | GigaUpdates/Sec | | G-FFT | 0.0222 | TeraFlops/Sec | | EP-STREAM Sys | 0.1638 | TeraBytes/Sec | | EP-STREAM Triad | 4.0951 | GigaBytes/Sec | | EP-DGEMM | 14.7898 | GigaFlops/Sec | | RandomRing Bandwidth | 0.5619 | GigaBytes/Sec | | RandomRing Latency | 2.1133 | micro-seconds | qacct values: ru_wallclock 130.716 ru_utime 5114.084 ru_stime 39.425 maxvmem 25.174G ==== runs: N = 72000 ==== package `acml/5.3.0-open64-fma4` package `open64/4.5` package `openmpi/1.4.4-open64` or `openmpi/1.6.1-open64` N = 72000, NB = 100, P = 12, Q = 16 nproc = 2x192 (384 slots with 192 MPI workers bound to a bulldozer core pair) Two runs mostly differ by the use of Qlogic PSM endpoints ^ Result ^ ''^''PSM (v1.4.4) ^ PSM (v1.6.1) ^ | HPL_Tflops | 1.68496 | 2.08056 | | StarDGEMM_Gflops | 14.6933 | 14.8339 | | SingleDGEMM_Gflops | 15.642 | 15.536 | | PTRANS_GBs | 9.25899 | 18.4793 | | StarFFT_Gflops | 1.19982 | 1.25452 | | StarSTREAM_Triad | 3.62601 | 3.65631 | | SingleFFT_Gflops | 1.44111 | 1.44416 | | MPIFFT_Gflops | 7.67835 | 77.603 | | RandomlyOrderedRingLatency_usec | 65.8478 | 2.44898 |