Differences

This shows you the differences between two versions of the page.

--- software:lapack:caviness [2020-03-09 12:46] – [Test result output] anita
+++ software:lapack:caviness [2021-04-27 16:21] (current) – external edit 127.0.0.1
@@ Line 25: / Line 25: @@
 already have.</note>
-===== Compiling with intel and mkl library =====
+===== Compiling with Intel and MKL library =====
 The [[https://software.intel.com/en-us/parallel-studio-xe/|Intel Parallel Studio XE]] comes installed with a Fortran compiler with the MKL library.  Use VALET, **''vpkg_versions intel''**. to find the latest version installed on Caviness.
@@ Line 45: / Line 45: @@
 <code>
+workgroup -g <<investing-entity>>
 vpkg_devrequire intel
 ifort -mkl dgels-ex.f -o dgels-ex
@@ Line 70: / Line 71: @@
 </code>
 </note>
-==== qsub file to test ====
+==== sbatch file to test ====
-The ''ifort'' compiler with flag ''-mkl'' will compile and link to the threaded MKL libraries.  Thus you should test in the threaded parallel environment, and export the number of slots to the ''MKL_NUM_THREAD'' environment variable. Remember to use our templates for threaded jobs which can be found in ''/opt/templates/slurm/generic/threads.qs'' as a starting point. Here is a simple ''test.qs'' based on the ''threads.qs'' template.
+The ''ifort'' compiler with flag ''-mkl'' will compile and link to the threaded MKL libraries.  Thus you should test in the threaded parallel environment, and export the number of slots to the ''MKL_NUM_THREAD'' environment variable. Remember to use our templates for threaded jobs which can be found in ''/opt/shared/templates/slurm/generic/threads.qs'' as a starting point. Here is a simple ''test.qs'' based on the ''threads.qs'' template.
 <file bash test.qs>
 #!/bin/bash -l
@@ Line 172: / Line 173: @@
 [(it_css:traine)@login01 nagex]$ more slurm-6718859.out
 -- OpenMP job setup complete:
---  OMP_NUM_THREADS      = 4
+--  OMP_THREAD_LIMIT     = 4
 --  OMP_PROC_BIND        = true
 --  OMP_PLACES           = cores
@@ Line 211: / Line 212: @@
   * Programs with small arrays will not benefit from the multi-threaded library, and may suffer a bit from the system overhead of maintaining multiple threads.
-  * Sequential programs are better suited for running simultaneous instances.  You could run 12 copies of the program on the same node with better throughput when you compile them to be sequential.  (Too many threads on the same node will contend for limited resources)
+  * Sequential programs are better suited for running simultaneous instances.  You could run ''n'' copies of the program on the same node, where ''n'' is the number of cores on that node, with better throughput when you compile them to be sequential.  (Too many threads on the same node will contend for limited resources)
   * You may be able to take control of the parallelism in your program with OPENMP compiler directions.  This is easiest if you using the single threaded MKL in your parallel regions. See [[https://software.intel.com/en-us/articles/recommended-settings-for-calling-intelr-mkl-routines-from-multi-threaded-applications|recommended settings for calling intel MKL routines from multi threaded applications]].
-===== Compiling with PGI and ACML library =====
-The [[http://developer.amd.com/tools-and-sdks/cpu-development/amd-core-math-library-acml//|AMD core math library (ACML)]] is from AMD developers, and is thus a good chioce form the Mills chip set. Use VALET, **''vpkg_versions acml''**. to find the latest version installed on Mills - ''5.3.0''.
-<note tip>**Versions:**
-From the release notes in the file ''/opt/shared/ACML/5.3.0/ReleaseNotes''
-   New features of release 5.3.0 of ACML
-     Updated the LAPACK code to version 3.4.0.
-</note>