software:laplace:laplace

The XSEDE HPC workshop focused on taking a serial version of the Laplace example and developing three parallel solutions for OpenMP, MPI and OpenACC. This documentation describes how to compile and run the serial and three parallel solutions of the Laplace example on Farber using different compilers.

Copy the examples to your home directory on Farber with the following command:

cp -r ~trainf/Exercises .
If benchmarking, consider using -l exclusive=1 with qlogin to prevent other jobs from running on your node during your interactive development.

GCC Fortran and C

Commands to type after login and starting a workgroup shell for GCC Fortan:

qlogin
cd Exercises/Serial/
vpkg_devrequire gcc/4.9
gfortran laplace_serial.f90
time ./a.out
exit

The Total time reported by the program is the user CPU time. The time command will also give you the wall clock time - real

[(it_css:trainf)@n038 Serial]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
( 995, 995): 63.33  ( 996, 996): 72.67  ( 997, 997): 81.40  ( 998, 998): 88.97  ( 999, 999): 94.86  (1000,1000): 98.67

                    -------- more iteration progress reports --------
                      
 ---------- Iteration number:         3300  ---------------
( 995, 995): 97.66  ( 996, 996): 98.24  ( 997, 997): 98.75  ( 998, 998): 99.19  ( 999, 999): 99.56  (1000,1000): 99.87
 Max error at iteration         3372  was   9.99533103575345194E-003
 Total time was    34.876698      seconds.

real    0m39.333s
user    0m34.876s
sys     0m0.002s

Commands to type after login and starting a workgroup shell for GCC C:

qlogin
cd Exercises/Serial/
vpkg_devrequire gcc/4.9
gcc laplace_serial.c -lm
time ./a.out
exit

The Total time reported by the program is the user CPU time. The time command will also give you the wall clock time - real

[(it_css:trainf)@n038 Serial]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
[995,995]: 63.33  [996,996]: 72.67  [997,997]: 81.40  [998,998]: 88.97  [999,999]: 94.86  [1000,1000]: 98.67

                    -------- more iteration progress reports --------
                      
 ---------- Iteration number:         3300  ---------------
[995,995]: 97.66  [996,996]: 98.24  [997,997]: 98.75  [998,998]: 99.19  [999,999]: 99.56  [1000,1000]: 99.87

Max error at iteration 3372 was 0.009995
Total time was 32.101587 seconds.

real    0m34.803s
user    0m32.123s
sys     0m0.004s
By default gcc does not do any optimization. Adding the options -O3 -ffast-math will produce similar results like the Intel compilers.
qlogin
cd Exercises/Serial/
vpkg_devrequire gcc/4.9
gfortran -O3 -ffast-math laplace_serial.f90
time ./a.out
exit

The Total time reported by the program is the user CPU time. The time command will also give you the wall clock time - real

[(it_css:trainf)@n038 Serial]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
( 995, 995): 63.33  ( 996, 996): 72.67  ( 997, 997): 81.40  ( 998, 998): 88.97  ( 999, 999): 94.86  (1000,1000): 98.67

                    -------- more iteration progress reports --------
                     
 ---------- Iteration number:         3300  ---------------
( 995, 995): 97.66  ( 996, 996): 98.24  ( 997, 997): 98.75  ( 998, 998): 99.19  ( 999, 999): 99.56  (1000,1000): 99.87
 Max error at iteration         3372  was    9.9953310357463465E-003
 Total time was    4.87525797      seconds.

real    0m7.105s
user    0m4.871s
sys     0m0.005s

Intel Fortran and C

Commands to type after login and starting a workgroup shell for Intel Fortran:

qlogin
cd Exercises/Serial/
vpkg_devrequire intel/2016
ifort laplace_serial.f90
time ./a.out
exit

The Total time reported by the program is the user CPU time. The time command will also give you the wall clock time - real

[(it_css:trainf)@n038 Serial]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
( 995, 995): 63.33  ( 996, 996): 72.67  ( 997, 997): 81.40  ( 998, 998): 88.97  ( 999, 999): 94.86  (1000,1000): 98.67

                    -------- more iteration progress reports --------
                      
 ---------- Iteration number:         3300  ---------------
( 995, 995): 97.66  ( 996, 996): 98.24  ( 997, 997): 98.75  ( 998, 998): 99.19  ( 999, 999): 99.56  (1000,1000): 99.87
 Max error at iteration         3372  was   9.995331035753452E-003
 Total time was    6.477016      seconds.

real    0m8.816s
user    0m6.473s
sys     0m0.007s

Commands to type after login and starting a workgroup shell for Intel C:

qlogin
cd Exercises/Serial/
vpkg_devrequire intel/2016
icc laplace_serial.f90
time ./a.out
exit

The Total time reported by the program is the user CPU time. The time command will also give you the wall clock time - real

[(it_css:trainf)@n038 Serial]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
[995,995]: 63.33  [996,996]: 72.67  [997,997]: 81.40  [998,998]: 88.97  [999,]: 94.86  [1000,1000]: 98.67

                    -------- more iteration progress reports --------
                      
 ---------- Iteration number:         3300  ---------------
[995,995]: 97.66  [996,996]: 98.24  [997,997]: 98.75  [998,998]: 99.19  [999,999]: 99.56  [1000,1000]: 99.87

Max error at iteration 3372 was 0.009995
Total time was 17.156667 seconds.

real    0m19.921s
user    0m17.162s
sys     0m0.009s
Consider using -O0 option (capital letter 'O' followed by the number zero '0') when using the Intel compilers to compile with no optimizations especially when you are debugging and testing your code for correctness.

Commands to type after login and starting a workgroup shell:

workgroup -g it_css
cd Exercises/OpenMP/Solutions/
qlogin -pe threads 4
export OMP_NUM_THREADS=4
vpkg_devrequire gcc/4.9
gfortran -fopenmp laplace_omp.f90
time ./a.out
exit

The Total time reported by the program is the user CPU time. The time command will also give you the wall clock time - real

[(it_css:trainf)@n036 Solutions]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
( 995, 995): 63.33  ( 996, 996): 72.67  ( 997, 997): 81.40  ( 998, 998): 88.97  ( 999, 999): 94.86  (1000,1000): 98.67  
 
                    -------- more iteration progress reports --------
                       
 ---------- Iteration number:         3300  ---------------
( 995, 995): 97.66  ( 996, 996): 98.24  ( 997, 997): 98.75  ( 998, 998): 99.19  ( 999, 999): 99.56  (1000,1000): 99.87  
 Max error at iteration         3372  was    9.9953310357534519E-003
 Total time was    46.3349571      seconds.

real	0m16.459s
user	0m46.331s
sys	0m0.009s

Same for C, with the compile statement:

gcc -fopenmp laplace_omp.c -lm
Intel compilers
workgroup -g it_css
cd Exercises/OpenMP/Solutions/
qlogin -pe threads 4
export OMP_NUM_THREADS=4
vpkg_devrequire intel/2016
ifort -qopenmp laplace_omp.f90
time ./a.out
exit
  • The option -openmp was depreciated in intel/2016, and replaced by -qopenmp.

Same for Intel C, with the compile statement:

icc -qopenmp laplace_omp.c

Commands to type after login and starting a workgroup shell:

qlogin -pe mpi 4
cd Exercises/MPI/Solutions/
vpkg_require openmpi
mpifort laplace_mpi.f90
mpirun -np 4 ./a.out
exit

The Total time reported is from a high resolution wall clock timer. (Part of the MPI specifications)

[(it_css:trainf)@n036 Solutions]$ mpirun -np 4 ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
( 995, 995): 63.33  ( 996, 996): 72.67  ( 997, 997): 81.40  ( 998, 998): 88.97  ( 999, 999): 94.86  (1000,1000): 98.67  

 
                    -------- more iteration progress reports --------
                      

---------- Iteration number:         3300  ---------------
( 995, 995): 97.66  ( 996, 996): 98.24  ( 997, 997): 98.75  ( 998, 998): 99.19  ( 999, 999): 99.56  (1000,1000): 99.87  
 Max error at iteration         3372  was   9.99533095416893502E-003
 Total time was    9.3715754      seconds.

Same for C, with the compile statement:

mpicc laplace_mpi.c -lm

This must be run on a node with a GPU accelerator card.

Commands to type after login and starting a workgroup shell:

qlogin -l gpu
cd Exercises/OpenACC/Solutions/
vpkg_devrequire pgi/16
pgf90 -acc -ta=tesla laplace_acc.f90
time ./a.out
exit

The Total time reported by the program is the time on the GPU. The time command will give you the wall clock time - real

[(it_css:trainf)@n036 Solutions]$ time ./a.out
 Maximum iterations [100-4000]?
4000
 ---------- Iteration number:          100  ---------------
( 995, 995): 63.33  ( 996, 996): 72.67  ( 997, 997): 81.40  ( 998, 998): 88.97  ( 999, 999): 94.86  (1000,1000): 98.67  
 
                    -------- more iteration progress reports --------
                       
 ---------- Iteration number:         3300  ---------------
( 995, 995): 97.66  ( 996, 996): 98.24  ( 997, 997): 98.75  ( 998, 998): 99.19  ( 999, 999): 99.56  (1000,1000): 99.87  
 Max error at iteration         3372  was    9.9953310357534519E-003
 Total time was     1.192778      seconds.

real	0m5.095s
user	0m0.891s
sys	0m0.371s

Same for C, with the compile statement:

pgcc -acc -ta=tesla laplace_acc.c
  • software/laplace/laplace.txt
  • Last modified: 2018-01-05 09:55
  • by anita