====== Compiling and testing LAPACK on Mills ======

The NAG sites has a [[http://www.nag.com/lapack-ex/|collection of examples]] to test LAPACK drivers.  A driver routine will call the necessary lower level routines to solve one particular problem.  For example, a real linear least square problem is solved by the ''dgels'' driver.   Driver routines may not be all LAPACK libraries, but you can download drivers from  [[http://www.netlib.org/lapack/lug/node25.html|netlib driver rouines]].  The source of the driver routine is useful for learning how to use the lower level routines.

===== Getting the example files =====

Each example in the NAG collection has a source file, a input file, and a output result file, which should match your result.

  * ''dels-ex.f''  - The Fortran 77 source file
  * ''ddels-ex.d''  - The input data file to be read from unit 5 (standard input)
  * ''dgels-ex.r''  - Should match the output on unit 6 (standard output)

You can use **''wget''** to get these with the script:

<code bash>
if [ ! -f "dgels-ex.f" ]; then
  wget http://www.nag.com/lapack-ex/examples/source/dgels-ex.f 
  wget http://www.nag.com/lapack-ex/examples/data/dgels-ex.d 
  wget http://www.nag.com/lapack-ex/examples/results/dgels-ex.r 
else 
  touch "dgels-ex.f"
fi
</code>
<note tip>You can just type the three **''wget''** commands in your terminal window, but it is a good idea to save them in a file for later reference.  In this case, you should enclose them in a conditional **''if''** statement to avoid downloading a file you
already have.</note>

===== Compiling with intel and mkl library =====

The [[https://software.intel.com/en-us/intel-composer-xe/|Intel Composer XE Suites]] comes installed with a Fortran compiler the MKL library.  Use VALET, **''vpkg_versions intel''**. to find the latest version installed on Mills - ''Version 2013 (2.144)''.

<note tip>**Versions:**
The newest package name is //Intel Composer XE SP1// and both update 1 and 2 are installed on Mills.  You can get the Package name and infer the update number from VALET, but you may also need the version of the compiler and the version of the LAPACK interfaces supported in the MKL component of the package.  See the [[https://software.intel.com/| Intel Development]] site.  From the [[https://software.intel.com/en-us/articles/intel-composer-release-notes-by-version|release notes]] for Linux Fortran 2013sp1,

   Update 2 - February 2014
     Intel Fortran Compiler updated to 14.0.2
     Intel Math Kernal Library updated to 11.1 Update 2

and the details on the main product page for MKL 11.1,
   LAPACK 3.4.1 interfaces and enhancements
</note>
==== VALET and ifort ====

Assuming you have the **''dgels-ex.f''** Fortran 77 source file, use the VALET and complile commands to
compile the source file to an executable that links with the MKL library.

<code>
vpkg_devrequire intel/14.0.2-64bit
ifort -mkl dgels-ex.f -o dgels-ex
</code>

The ''**ifort**'' compiler has an ''**-mkl**'' optimization flag, and from the man page or ''**ifort --help**''

<code>
   -mkl[=<arg>]
          link to the Intel(R) Math Kernel Library (Intel(R) MKL) and bring
          in the associated headers
            parallel   - link using the threaded Intel(R) MKL libraries. This
                         is the default when -mkl is specified
            sequential - link using the non-threaded Intel(R) MKL libraries

            cluster    - link using the Intel(R) MKL Cluster libraries plus
                         the sequential Intel(R) MKL libraries
</code>
<note tip>**Using make**: This is a simple compiler command, but you may want to get prepared for more complicated projects, with multiple source files, libraries and compiler flags.  Here are the commands to run the same compile command (using [[https://www.gnu.org/software/make/manual/html_node/Implicit-Rules.html#Implicit-Rules|make's implicit rules]])
<code>
vpkg_devrequire intel/14.0.2-64bit
export FC=ifort
export FFLAGS=-mkl
make dgels-ex
</code>
</note>
==== qsub file to test ====

The ''ifort'' compiler with flag ''-mkl'' will compile and link to the threaded MKL libraries.  Thus you should test in the threaded parallel environment, and export the number of slots to the ''MKL_NUM_THREAD'' environment variable.
<file bash test.qs>
#$ -N dgels-ex
#$ -pe threads 4

echo "--- Set environment ---"
source /opt/shared/valet/docs/valet.sh
vpkg_require intel/14.0.2-64bit

echo ""
echo "--- Run Test with $NSLOTS threads ---"
export MKL_NUM_THREADS=$NSLOTS
time ./$JOB_NAME < $JOB_NAME.d

echo ""
echo "--- Compare Results ---"
cat $JOB_NAME.r
</file>

==== Test result output ====
<code>
--- Set environment ---
WARNING: 'gcc' was not found
Adding package `intel/2013-2.144-64bit` to your environment

--- Run Test with 4 threads ---
 DGELS Example Program Results
 
 Least squares solution
      1.5339     1.8707    -1.5241     0.0392
 
 Square root of the residual sum of squares
      2.22E-02

real	0m0.966s
user	0m0.003s
sys	0m0.031s

--- Compare Results ---
 DGELS Example Program Results

 Least squares solution
      1.5339     1.8707    -1.5241     0.0392

 Square root of the residual sum of squares
      2.22E-02
</code>

<note warning>**WARNING: 'gcc' was not found**:  The release notes for this version of intel composer suite indicate
that GNU **''gdk''** is included for debugging, which requires **''gcc''**.  This warning can be ignored if you are not debugging with **''gdk''** in the batch script. You will not get this warning on the head node, since the system version of **''gcc''** will always be found in your path. 

If your are debugging on the compute nodes or want to remove the warning, add
<code>
vpkg_require gcc/4.6
</code>
before the intel ''vpkg_require'' command in you batch script file.
</note>

<note important>Sub-second timing results are not reliable.  This test is not a benchmark and was meant to show that
you can compile an link a program to read the data file, call the LAPACK routine ''**dgels**'', and reproduce the correct results.</note>

==== Sequential vs parallel ====

This example used the default parallel MKL libraries.  The LAPACK library is a collection of routines, which parallelize nicely (for large problems), and MKL is an optimized multi-threaded library.  For large probrems you get the best performance with the default.  However, there are three important considerations when using MKL.

  * Programs with small arrays will not benefit from the multi-threaded library, and may suffer a bit from the system overhead of maintaining multiple threads.
  * Sequential programs are better suited for running simultaneous instances.  You could run 12 copies of the program on the same node with better throughput when you compile them to be sequential.  (Too many threads on the same node will contend for limited resources)
  * You may be able to take control of the parallelism in your program with OPENMP compiler directions.  This is easiest if you using the single threaded MKL in your parallel regions. See [[https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications|recommended settings for calling intel MKL routines from multi threaded applications]].

===== Compiling with PGI and ACML library =====

The [[http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml//|AMD core math library (ACML)]] is from AMD developers, and is thus a good chioce form the Mills chip set. Use VALET, **''vpkg_versions acml''**. to find the latest version installed on Mills - ''5.3.0''.

<note tip>**Versions:**
From the release notes in the file ''/opt/shared/ACML/5.3.0/ReleaseNotes''
   New features of release 5.3.0 of ACML
     Updated the LAPACK code to version 3.4.0.
</note>