software:hpcc:mills

HPCC with open64 compiler, ACML and base FFT

Here are som modifications on Mills based on recommendations from Using ACML (AMD Core Math Library) In High Performance Computing Challenge (HPCC)

Changes to the make file hpl/setup/Make.Linux_ATHLON_FBLAS copied to hpl/Make.open64-acml

  1. Comment lines beginning in MP or LA
  2. Change /usr/bin/gcc to mpicc
  3. Change /usr/bin/g77 to mpif77
  4. Append -DHPCC_FFT_235 to CCFLAGS

The Valet commands are

vpkg_devrequire acml/5.2.0-open64-fma4
vpkg_devrequire openmpi/1.6.1-open64

Exported variables (to set values for commented LAinc and LAlib)

export LAinc="$CPPFLAGS"
export LAlib="$LDFLAGS -lacml"

Make command with 4 threads

make -j 4 arch=open64-acml
package `acml/5.2.0-open64-fma4` 
package `open64/4.5` 
package `openmpi/1.4.4-open64` 
N = 30000, NB = 100, P = 6, Q=8

These runs need 48 processes (6 per row and 8 per column.) The same number of processes are run with 48 or 96 slots.

Options Grid Engine MPI flags
NCPU=1 -pe openmpi 48 –bind-to-core
NCPU=2 -pe openmpi 96 –bind-to-core –bycore –cpus-per-proc 2 -np 48

HPCC benchmark results for two runs:

result NCPU=1 NCPU=2
HPL_Tflops 0.0769491 1.54221
StarDGEMM_Gflops 1.93686 14.6954
SingleDGEMM_Gflops 11.5042 15.6919
MPIRandomAccess_LCG_GUPs 0.0195047 0.00352421
MPIRandomAccess_GUPs 0.0194593 0.00410853
StarRandomAccess_LCG_GUPs 0.0113424 0.0302748
SingleRandomAccess_LCG_GUPs 0.0448261 0.0568664
StarRandomAccess_GUPs 0.0113898 0.0288637
SingleRandomAccess_GUPs 0.0521811 0.053262
StarFFT_Gflops 0.557555 1.14746
SingleFFT_Gflops 1.2178 1.45413
MPIFFT_Gflops 5.31624 34.3552
package `acml/5.3.0-open64-fma4` 
package `open64/4.5` 
package `openmpi/1.4.4-open64` or  `openmpi/1.6.1-open64`
N = 72000, NB = 100, P = 12, Q = 16
nproc = 2x192   (384 slots with 192 MPI workers bound to a bulldozer core pair)

Two runs mostly differ by the use of Qlogic PSM endpoints

Result ^PSM (v1.4.4) PSM (v1.6.1)
HPL_Tflops 1.68496 2.08056
StarDGEMM_Gflops 14.6933 14.8339
SingleDGEMM_Gflops 15.642 15.536
PTRANS_GBs 9.25899 18.4793
StarFFT_Gflops 1.19982 1.25452
StarSTREAM_Triad 3.62601 3.65631
SingleFFT_Gflops 1.44111 1.44416
MPIFFT_Gflops 7.67835 77.603
RandomlyOrderedRingLatency_usec 65.8478 2.44898
N = 72000, NB = 100, P = 12, Q=16, NP=384

package `acml/5.2.0-open64-fma4` to your environment
package `open64/4.5` to your environment
package `openmpi/1.4.4-open64` to your environment

 package `acml/5.3.0-open64-fma4` to your environment
 package `open64/4.5` to your environment
 package `openmpi/1.6.1-open64` to your environment
 
 
Result 139765 145105
HPL_Tflops 1.54221 0.364243
StarDGEMM_Gflops 14.6954 13.6194
SingleDGEMM_Gflops 15.6919 15.453
PTRANS_GBs 1.14913 1.07982
MPIRandomAccess_GUPs 0.00410853 0.00679052
StarSTREAM_Triad 3.39698 2.83863
StarFFT_Gflops 1.14746 0.737805
SingleFFT_Gflops 1.45413 1.3756
MPIFFT_Gflops 34.3552 32.3555
RandomlyOrderedRingLatency_usec 77.9332 76.9595
  • software/hpcc/mills.txt
  • Last modified: 2018-05-08 13:25
  • by sraskar