software:lapack:caviness

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:lapack:caviness [2020-03-05 14:24] – [Compiling with intel and mkl library] anitasoftware:lapack:caviness [2021-04-27 16:21] (current) – external edit 127.0.0.1
Line 25: Line 25:
 already have.</note> already have.</note>
  
-===== Compiling with intel and mkl library =====+===== Compiling with Intel and MKL library =====
  
-The [[https://software.intel.com/en-us/intel-composer-xe/|Intel Compiler Suite]] comes installed with a Fortran compiler the MKL library.  Use VALET, **''vpkg_versions intel''**. to find the latest version installed on Caviness.+The [[https://software.intel.com/en-us/parallel-studio-xe/|Intel Parallel Studio XE]] comes installed with a Fortran compiler with the MKL library.  Use VALET, **''vpkg_versions intel''**. to find the latest version installed on Caviness.
  
 <note tip>**Versions:** <note tip>**Versions:**
-You can get the Package name and infer the update number from VALET, but you may also need the version of the compiler and the version of the LAPACK interfaces supported in the MKL component of the package.  See the [[https://software.intel.com/| Intel Development]] site.  From the [[https://software.intel.com/en-us/articles/intel-composer-release-notes-by-version|release notes]] for Linux Fortran 2013sp1,+You can get the Package name and infer the update number from VALET, but you may also need the version of the compiler and the version of the LAPACK interfaces supported in the MKL component of the package.  See the [[https://software.intel.com/en-us/documentation| Intel Dcoumentation]] site and look at complete specifications for [[https://software.intel.com/en-us/articles/intel-parallel-studio-xe-release-notes-and-new-features|release notes and features]].
  
    Update 2 - February 2014    Update 2 - February 2014
Line 41: Line 41:
 ==== VALET and ifort ==== ==== VALET and ifort ====
  
-Assuming you have the **''dgels-ex.f''** Fortran 77 source file, use the VALET and complile commands to +Assuming you have the **''dgels-ex.f''** Fortran 77 source file, use the VALET and the appropriate compile commands to 
-compile the source file to an executable that links with the MKL library.+compile the source file to an executable that links with the MKL library. Remember VALET will choose the default version of the Intel Compiler Suite, if you do not specify a version.
  
 <code> <code>
-vpkg_devrequire intel/14.0.2-64bit+workgroup -g <<investing-entity>> 
 +vpkg_devrequire intel
 ifort -mkl dgels-ex.f -o dgels-ex ifort -mkl dgels-ex.f -o dgels-ex
 </code> </code>
Line 64: Line 65:
 <note tip>**Using make**: This is a simple compiler command, but you may want to get prepared for more complicated projects, with multiple source files, libraries and compiler flags.  Here are the commands to run the same compile command (using [[https://www.gnu.org/software/make/manual/html_node/Implicit-Rules.html#Implicit-Rules|make's implicit rules]]) <note tip>**Using make**: This is a simple compiler command, but you may want to get prepared for more complicated projects, with multiple source files, libraries and compiler flags.  Here are the commands to run the same compile command (using [[https://www.gnu.org/software/make/manual/html_node/Implicit-Rules.html#Implicit-Rules|make's implicit rules]])
 <code> <code>
-vpkg_devrequire intel/14.0.2-64bit+vpkg_devrequire intel
 export FC=ifort export FC=ifort
 export FFLAGS=-mkl export FFLAGS=-mkl
Line 70: Line 71:
 </code> </code>
 </note> </note>
-==== qsub file to test ====+==== sbatch file to test ====
  
-The ''ifort'' compiler with flag ''-mkl'' will compile and link to the threaded MKL libraries.  Thus you should test in the threaded parallel environment, and export the number of slots to the ''MKL_NUM_THREAD'' environment variable.+The ''ifort'' compiler with flag ''-mkl'' will compile and link to the threaded MKL libraries.  Thus you should test in the threaded parallel environment, and export the number of slots to the ''MKL_NUM_THREAD'' environment variable. Remember to use our templates for threaded jobs which can be found in ''/opt/shared/templates/slurm/generic/threads.qs'' as a starting point. Here is a simple ''test.qs'' based on the ''threads.qs'' template.
 <file bash test.qs> <file bash test.qs>
-#-N dgels-ex +#!/bin/bash -
-#-pe threads 4+
 +# Sections of this script that can/should be edited are delimited by a 
 +# [EDIT] tag.  All Slurm job options are denoted by a line that starts 
 +# with "#SBATCH " followed by flags that would otherwise be passed on 
 +# the command line.  Slurm job options can easily be disabled in a 
 +# script by inserting a space in the prefix, e.g. "# SLURM " and 
 +# reenabled by deleting that space. 
 +
 +# This is a batch job template for a program using multiple processor 
 +# cores/threads on a single node.  This includes programs with OpenMP 
 +# parallelism or explicit threading via the pthreads library. 
 +
 +# Do not alter the --nodes/--ntasks options! 
 +#SBATCH --nodes=1 
 +#SBATCH --ntasks=1 
 +
 +# [EDIT] Indicate the number of processor cores/threads to be used 
 +#        by the job: 
 +
 +#SBATCH --cpus-per-task=4 
 +
 +# [EDIT] All jobs have memory limits imposed.  The default is 1 GB per 
 +#        CPU allocated to the job.  The default can be overridden either 
 +#        with a per-node value (--mem) or a per-CPU value (--mem-per-cpu) 
 +#        with unitless values in MB and the suffixes K|M|G|T denoting 
 +#        kibi, mebi, gibi, and tebibyte units.  Delete the space between 
 +#        the "#" and the word SBATCH to enable one of them: 
 +
 +# SBATCH --mem=8G 
 +# SBATCH --mem-per-cpu=1024M  
 +
 +# .... more options not used .... 
 +
 +# [EDIT] It can be helpful to provide a descriptive (terse) name for 
 +#        the job (be sure to use quotes if there's whitespace in the 
 +#        name): 
 +
 +#SBATCH --job-name=dgels-ex 
 +
 +# [EDIT] The partition determines which nodes can be used and with what 
 +#        maximum runtime limits, etc.  Partition limits can be displayed 
 +#        with the "sinfo --summarize" command. 
 +
 +# SBATCH --partition=standard 
 +
 +#        To run with priority-access to resources owned by your workgroup, 
 +#        use the "_workgroup_" partition: 
 +
 +#SBATCH --partition=_workgroup_ 
 +
 +# [EDIT] The maximum runtime for the job; a single integer is interpreted 
 +#        as a number of minutes, otherwise use the format 
 +
 +#          d-hh:mm:ss 
 +
 +#        Jobs default to the default runtime limit of the chosen partition 
 +#        if this option is omitted. 
 +
 +#SBATCH --time=0-02:00:00 
 +
 +#        You can also provide a minimum acceptable runtime so the scheduler 
 +#        may be able to run your job sooner.  If you do not provide a 
 +#        value, it will be set to match the maximum runtime limit (discussed 
 +#        above). 
 +
 +# SBATCH --time-min=0-01:00:00 
 +
 +# .... more options not used .... 
 +
 +# Do standard OpenMP environment setup: 
 +
 +. /opt/shared/slurm/templates/libexec/openmp.sh 
 + 
 +
 +# [EDIT] Execute your OpenMP/threaded program using the srun command: 
 +#
  
 echo "--- Set environment ---" echo "--- Set environment ---"
-source /opt/shared/valet/docs/valet.sh +vpkg_require intel
-vpkg_require intel/14.0.2-64bit+
  
 echo "" echo ""
-echo "--- Run Test with $NSLOTS threads ---" +echo "--- Run Test with $SLURM_CPUS_PER_TASK threads ---" 
-export MKL_NUM_THREADS=$NSLOTS +export MKL_NUM_THREADS=$SLURM_CPUS_PER_TASK 
-time ./$JOB_NAME < $JOB_NAME.d+time ./$SLURM_JOB_NAME < $SLURM_JOB_NAME.d
  
 echo "" echo ""
 echo "--- Compare Results ---" echo "--- Compare Results ---"
-cat $JOB_NAME.r+cat $SLURM_JOB_NAME.r
 </file> </file>
  
 ==== Test result output ==== ==== Test result output ====
 <code> <code>
 +[traine@login01 nagex]$ workgroup -g it_css
 +[(it_css:traine)@login01 nagex]$ sbatch test.qs
 +Submitted batch job 6718859
 +[(it_css:traine)@login01 nagex]$ more slurm-6718859.out
 +-- OpenMP job setup complete:
 +--  OMP_THREAD_LIMIT     = 4
 +--  OMP_PROC_BIND        = true
 +--  OMP_PLACES           = cores
 +--  MP_BLIST             = 0,1,2,3
 +
 --- Set environment --- --- Set environment ---
-WARNING: 'gcc' was not found +Adding package `intel/2018u4` to your environment
-Adding package `intel/2013-2.144-64bit` to your environment+
  
 --- Run Test with 4 threads --- --- Run Test with 4 threads ---
  DGELS Example Program Results  DGELS Example Program Results
- +
  Least squares solution  Least squares solution
       1.5339     1.8707    -1.5241     0.0392       1.5339     1.8707    -1.5241     0.0392
- +
  Square root of the residual sum of squares  Square root of the residual sum of squares
       2.22E-02       2.22E-02
  
-real 0m0.966s +real    0m1.043s 
-user 0m0.003s +user    0m0.007s 
-sys 0m0.031s+sys     0m0.049s
  
 --- Compare Results --- --- Compare Results ---
Line 119: Line 203:
       2.22E-02       2.22E-02
 </code> </code>
- 
-<note warning>**WARNING: 'gcc' was not found**:  The release notes for this version of intel composer suite indicate 
-that GNU **''gdk''** is included for debugging, which requires **''gcc''**.  This warning can be ignored if you are not debugging with **''gdk''** in the batch script. You will not get this warning on the head node, since the system version of **''gcc''** will always be found in your path.  
- 
-If your are debugging on the compute nodes or want to remove the warning, add 
-<code> 
-vpkg_require gcc/4.6 
-</code> 
-before the intel ''vpkg_require'' command in you batch script file. 
-</note> 
  
 <note important>Sub-second timing results are not reliable.  This test is not a benchmark and was meant to show that <note important>Sub-second timing results are not reliable.  This test is not a benchmark and was meant to show that
Line 138: Line 212:
  
   * Programs with small arrays will not benefit from the multi-threaded library, and may suffer a bit from the system overhead of maintaining multiple threads.   * Programs with small arrays will not benefit from the multi-threaded library, and may suffer a bit from the system overhead of maintaining multiple threads.
-  * Sequential programs are better suited for running simultaneous instances.  You could run 12 copies of the program on the same node with better throughput when you compile them to be sequential.  (Too many threads on the same node will contend for limited resources)+  * Sequential programs are better suited for running simultaneous instances.  You could run ''n'' copies of the program on the same node, where ''n'' is the number of cores on that node, with better throughput when you compile them to be sequential.  (Too many threads on the same node will contend for limited resources)
   * You may be able to take control of the parallelism in your program with OPENMP compiler directions.  This is easiest if you using the single threaded MKL in your parallel regions. See [[https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications|recommended settings for calling intel MKL routines from multi threaded applications]].   * You may be able to take control of the parallelism in your program with OPENMP compiler directions.  This is easiest if you using the single threaded MKL in your parallel regions. See [[https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications|recommended settings for calling intel MKL routines from multi threaded applications]].
  
-===== Compiling with PGI and ACML library ===== 
- 
-The [[http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml//|AMD core math library (ACML)]] is from AMD developers, and is thus a good chioce form the Mills chip set. Use VALET, **''vpkg_versions acml''**. to find the latest version installed on Mills - ''5.3.0''. 
- 
-<note tip>**Versions:** 
-From the release notes in the file ''/opt/shared/ACML/5.3.0/ReleaseNotes'' 
-   New features of release 5.3.0 of ACML 
-     Updated the LAPACK code to version 3.4.0. 
-</note> 
  
          
  • software/lapack/caviness.1583436298.txt.gz
  • Last modified: 2020-03-05 14:24
  • by anita