abstract:darwin:app_dev:prog_env

Programming Environment

This section uses the wiki's documentation conventions.

There are two memory models for computing: distributed-memory and shared-memory. In the former, the message passing interface (MPI) is employed in programs to communicate between processors that use their own memory address space. In the latter, open multiprocessing (OMP) programming techniques are employed for multiple threads (light weight processes) to access memory in a common address space. When your job spans several compute nodes, you must use an MPI model.

Distributed memory systems use single-program multiple-data (SPMD) and multiple-program multiple-data (MPMD) programming paradigms. In the SPMD paradigm, each processor loads the same program image and executes and operates on data in its own address space (different data). It is the usual mechanism for MPI code: a single executable is available on each node (through a globally accessible file system such as $WORKDIR), and launched on each node (through the MPI wrapper command, mpirun).

The shared-memory programming model is used on Symmetric Multi-Processor (SMP) nodes such as a single typical compute node (20 or 24 cores, 64 GB memory). The programming paradigm for this memory model is called Parallel Vector Processing (PVP) or Shared-Memory Parallel Programming (SMPP). The former name is derived from the fact that vectorizable loops are often employed as the primary structure for parallelization. The main point of SMPP computing is that all of the processors in the same node share data in a single memory subsystem. There is no need for explicit messaging between processors as with MPI coding.

The SMPP paradigm employs compiler directives (as pragmas in C/C++ and special comments in Fortran) or explicit threading calls (e.g. with Pthreads). The majority of science codes now use OpenMP directives that are understood by most vendor compilers, as well as the GNU compilers.

In cluster systems that have SMP nodes and a high-speed interconnect between them, programmers often treat all CPUs within the cluster as having their own local memory. On a node, an MPI executable is launched on each processor core and runs within a separate address space. In this way, all processor cores appear as a set of distributed memory machines, even though each node has processor cores that share a single memory subsystem.

Clusters with SMPs sometimes employ hybrid programming to take advantage of higher performance at the node-level for certain algorithms that use SMPP (OMP) parallel coding techniques. In hybrid programming, OMP code is executed on the node as a single process with multiple threads (or an OMP library routine is called), while MPI programming is used at the cluster-level for exchanging data between the distributed memories of the nodes.

Fortran, C, C++, Java and Matlab programs should be compiled on the login node, however if lengthy compiles or extensive resources needed, you may need to schedule a job for compilation using salloc or sbatch which will be billed to your allocation. All resulting executables should only be run on the compute nodes.

There are three 64-bit compiler suites that IT generally installs and supports: PGI CDK (Portland Group Inc.'s Cluster Development Kit), Intel Composer XE 2011, and GNU. In addition, IT has installed OpenJDK (Open Java Development Kit), which must only be used on the compute nodes. (Type vpkg_info openjdk for more information on OpenJDK.)

The PGI compilers exploit special features of AMD processors. If you use open-source compilers, we recommend the GNU collection.

You can use a VALET vpkg_require command to set the UNIX environment for the compiler suite you want to use. After you issue the corresponding vpkg_require command, the compiler path and supporting environment variables will be defined.

A general command for basic source code compilation is:

<compiler> <compiler_flags> <source_code_filename> -o <executable_filename>

For each compiler suite, the table below displays the compiler name, a link to documentation describing the compiler flags, and the appropriate filename extension for the source code file. The executable will be named a.out unless you use the -o <executable_filename> option.

To view the compiler option flags, their syntax, and a terse explanation, execute a compiler command with the -help option. Alternatively, read the compiler's man pages.

PGI VALET command Reference manuals User guides
vpkg_require pgi C, Fortran C, Fortran
Compiler Language Common filename extensions
pgfortran F90, F95, F2003 .f, .for, .f90, .f95
pgf77 F77 .f
pgCC C++ .C, .cc
pgcc C .c
Intel VALET command Reference manuals User guides
vpkg_require intel C, Fortran C, Fortran
Compiler Language Common filename extensions
ifort F77, F90, F95 .f, .for, .f90, .f95
icpc C++ .C, .c, .cc, .cpp, .cxx, .c++, .i, .ii
icc C .c
GCC VALET command Reference manuals User guides
vpkg_require gcc C, Fortran C, Fortran
Compiler Language Common filename extensions
gfortran, f95 F77, F90, F95 .f, .f90, .f95
g++ C++ .C, .c, .cc, .cpp, .cxx, .c++, .i, .ii
gcc C .c

This section uses the PGI compiler suite to illustrate simple Fortran and C compiler commands that create an executable. For each compiler suite, you must first set the UNIX environment so the compilers and libraries are available to you. VALET commands provide a simple way to do this.

The examples below show the compile and link steps in a single command. These illustrations use source code files named fdriver.f90 (Fortran 90) or cdriver.c (C). They all use the -o option to produce an executable named 'driver.' The optional -fpic PGI compiler flag generates position-independent code and creates smaller executables. You might also use code optimization option flags such as -fast after debugging your program.

You can use the -c option instead to create a .o object file that you would later link to other object files to create the executable.

Some people use the UNIX make command to compile source code. There are many good online tutorials on the basics of using make. Also available is a cross-platform makefile generator, cmake. You can set the UNIX environment for cmake by typing the vpkg_require cmake command.

Using the PGI suite to illustrate:

First use a VALET command to set the environment:

 vpkg_require pgi

Then use that compiler suite's commands:

Fortran 90 example:
 pgfortran -fpic fdriver.f90 -o driver
C example:
 pgcc -fpic cdriver.c -o driver

If your program only uses OpenMP directives, has no message passing, and your target is a single SMP node, you should add the OpenMP compiler flag to the serial compiler flags.

Compiler suite OpenMP compiler flag
PGI -mp
Open64 -mp
Intel -openmp
Intel-2016 -qopenmp
GCC -fopenmp


Instead of using OpenMP directives in your program, you can add an OpenMP-based library. You will still need the OpenMP compiler flag when you use the library.

MPI implementations

In the distributed-memory model, the message passing interface (MPI) allows programs to communicate between processors that use their own node's memory address space. It is the most commonly used library and runtime environment for building and executing distributed-memory applications on clusters of computers.

OpenMPI is the most desirable MPI implementation to use. It is the only one that works for job suspension, checkpointing, and task migration to other processors. These capabilities are needed to enable opportunistic use of idle nodes as well as to configure short-term and long-term queues.

Some software comes packaged with other MPI implementations that IT cannot change. In those cases, their VALET configuration files use the bundled MPI implementation. However, we recommend that you use OpenMPI whenever you need an MPI implementation.

MPI compiler wrappers

The OpenMPI implementation provides OpenMPI library compilers for C, C++, Fortran 77, 90, and 95. These compiler wrappers add MPI support to the actual compiler suites by passing additional information to the compiler. You simply use the MPI compiler wrapper in place of the compiler name you would normally use.

The compiler suite that's used depends on your UNIX environment settings. Use VALET commands to simultaneously set your environment to use the OpenMPI implementation and to select a particular compiler suite. The commands for the four compiler suites are:

  vpkg_require openmpi/1.4.4-pgi
  vpkg_require openmpi/1.4.4-open64
  vpkg_require openmpi/1.4.4-intel64
  vpkg_require openmpi/1.4.4-gcc

(Type vpkg_versions openmpi to see if newer versions are available.)

The vpkg_require command selects the MPI and compiler suite combination, and then you may use the compiler wrapper commands repeatedly. The wrapper name depends only on the language used, not the compiler suite you choose: mpicc (C), mpicxx or mpic++ (C++), mpi77 (Fortran 77), and mpif90 (Fortran 90 and 95).

Fortran example:
vpkg_require openmpi/1.4.4-pgi
mpif90 -fpic fdriver.f90 -o driver
C example:
vpkg_require openmpi/1.4.4-pgi
mpicc -fpic cdriver.c -o driver

You may use other compiler flags listed in each compiler suite's documentation.

To modify the options used by the MPI wrapper commands, consult the FAQ section of the OpenMPI web site.

IT installs high-quality math and utility libraries that are used by many applications. These libraries provide highly optimized math packages and functions. To determine which compilers IT used to prepare a library version, use the vpkg_versions VALET command.

Here is a representative sample of installed libraries. Use the vpkg_list command to see the most current list of libraries.

Open-source libraries
  • ATLAS: Automatically Tuned Linear Algebra Software (portable)
  • FFTW: Discrete Fast Fourier Transform library
  • GOTOBLAS2: Enhanced BLAS routines from the Texas Advanced Computing Center (TACC)
  • HDF4 and HDF5: Hierarchical Data Format suite (file formats and libraries for storing and organizing large, numerical data collections)
  • HYPRE: High-performance preconditioners for linear system solvers (from LLNL)
  • LAPACK: Linear algebra routines
  • Matplotlib: Python-based 2D publication-quality plotting library
  • netCDF: network Common Data Form for creation, access and sharing of array-oriented scientific data
  • ScaLAPACK - Scalable LAPACK: Subset of LAPACK routines redesigned for distributed memory MIMD parallel computers using MPI
  • VTK – Visualization ToolKit: A platform for 3D computer graphics and visualization
Commercial libraries
  • AOCL: AMD Optimizing CPU Libraries (See AMD's AOCL User Guide.) AOCL is the successor to ACML.
  • IMSL: RogueWave's mathematical and statistical libraries
  • MKL: Intel's Math Kernel Library
  • NAG: Numerical Algorithms Group's numerical libraries

The libraries will be optimized a given cluster architecture. Note that the calling sequences of some of the commercial library routines differ from their open-source counterparts.

Introduction

This section shows you how to link your program with libraries you or your colleagues have created or with centrally installed libraries such as ACML or FFTW. The examples introduce special environment variables (FFLAGS, CFLAGS, CPPFLAGS and LDFLAGS) whose use simplifies a command's complexity. The VALET commands vpkg_require and vpkg_devrequire can easily define the working environment for your compiler suite choice.

Joint use of VALET and these environment variables will also prepare your UNIX environment to support your use of make for program development. VALET will accommodate using one or several libraries, and you can extend its functionality for software you develop or install.

You should use Intel MKL — it's a highly-optimized BLAS/LAPACK library.

If you use the Intel compilers, you can add -mkl to your link command, e.g.

    ifort -o program -mkl=sequential [...]
 
    ifort -o program -qopenmp -mkl=parallel [...]

The former uses the serial library, the latter uses the threaded library that respects the OpenMP runtime environment of the job for multithreaded BLAS/LAPACK execution.

If you're not using the Intel compilers, you'll need to generate the appropriate compiler directives using Intel's online tool.

Please use "dynamic linking" since that allows MKL to adjust the underlying kernel functions at runtime according to the hardware on which you're running. If you use static linking, you're tied to the lowest common hardware model available and you will usually not see as good performance.

You'll need to load a version of Intel into the environment before compiling/building and also at runtime using VALET such as

    vpkg_require intel/2019

Among other things, this will set MKLROOT in the environment to the appropriate path, which the link advisor references. The MKL version (year) matches that of the compiler version (year).

To determine the available versions of Intel installed use

$ vpkg_versions intel

Fortran examples illustrated with the PGI compiler suite

Reviewing the basic compilation command

The general command for compiling source code:

«compiler» «compiler_flags» «source_code_filename» -o «executable_filename»

For example:

vpkg_require pgi
pgfortran -fpic fdriver.f90 -o driver
Using user-supplied libraries

To compile fdriver.f90 and link it to a shared F90 library named libfstat.so stored in $HOME/lib, add the library location and the library name (fstat) to the command:

pgfortran -fast -fpic -L$HOME/lib fdriver.f90 -lfstat -o driver

The -L option flag is for the shared library directory's name; the -l flag is for the specific library name.

You can simplify this compiler command by creating and exporting two special environment variables. FFLAGS represents a set of Fortran compiler option flags; LDFLAGS represents the location and choice of your library.

vpkg_require pgi
export FFLAGS='-fpic'
export LDFLAGS='-L$HOME/lib'
export LDLIBS='-lfstat'
pgfortran $FFLAGS $LDFLAGS fdriver.f90 $LDLIBS -o driver

Extending this further, you might have several libraries in one or more locations. In that case, list all of the '-l' flags in the LDLIBS statement, for example,

export LDLIBS='-lfstat -lfpoly'

and all of the '-L' flags in the LDFLAGS statement. (The order in which the '-L' directories appear in LDFLAGS determines the search order.)

Using centrally supplied libraries (ACML, MKL, FFTW, etc.)

This extends the previous section's example by illustrating how to use VALET's vpkg_devrequire command to locate and link a centrally supplied library such as AMD's Core Math Library, ACML. Several releases (versions) of a library may be installed, and some may have been compiled with several compiler suites.

To view your choices, use VALET's vpkg_versions command:

vpkg_versions acml

The example below uses the acml/5.0.0-pgi-fma4 version, the single-threaded, ACML 5.0.0 FMA4 library compiled with the PGI 11 compilers. Since that version depends on the PGI 11 compiler suite,

vpkg_devrequire acml/5.0.0-pgi-fma4 pgi

jointly sets the UNIX environment for both ACML and the PGI compiler suite. Therefore, you should not also issue a vpkg_require pgi command.

Unlike vpkg_require, vpkg_devrequire also modifies key environment variables including LDFLAGS.

Putting it all together, the complete example using the library named acml is:

vpkg_devrequire acml/5.0.0-pgi-fma4 pgi
export FFLAGS='-fpic'
export LDLIBS='-lacml'
pgfortran $FFLAGS $LDFLAGS fdriver.f90 $LDLIBS -o driver

Note that $LDFLAGS must be in the compile statement but does not need an explicit export command here. The vpkg_devrequire command above defined and exported LDFLAGS and its value.

Using user-supplied libraries and centrally supplied libraries together

This final example illustrates how to use your fstat and fpoly libraries (both in $HOME/lib) with the acml5.0.0 library:

vpkg_devrequire acml/5.0.0-pgi-fma4 pgi
export FFLAGS='-fpic'
export LDFLAGS='-L$HOME/lib $LDFLAGS'
export LDLIBS='-lacml -lfstat -lfpoly'
pgfortran $FFLAGS $LDFLAGS fdriver.f90 $LDLIBS -o driver

Remember that the library search order depends on the order of the LDFLAGS libraries.

C examples illustrated with the PGI compiler suite

Reviewing the basic compilation command

The general command for compiling source code:

«compiler» «compiler_flags» «source_code_filename» -o «executable_filename»

For example,

vpkg_require pgi
pgcc -fpic cdriver.c -o driver
Using user-supplied libraries

To compile cdriver.c and link it to a shared C library named libcstat.so stored in $HOME/lib and include header files in $HOME/inc, add the library location and the library name (cstat) to the command.

pgcc -fpic –I$HOME/inc –L$HOME/lib cdriver.c –lcstat -o driver

The -I option flag is for the include library's location; the -L flag is for the shared library directory's name; and the -l flag is for the specific library name.

You can simplify this compiler command by creating and exporting two special environment variables. CFLAGS represents a set of C compiler option flags; CPPFLAGS represents the C++ preprocessor flags; and LDFLAGS represents the location and choice of your shared library.

pkg_require pgi
export CFLAGS='-fpic'
export CPPFLAGS='$HOME/inc'
export LDFLAGS='-L$HOME/lib'
export LDLIBS='-lcstat'
pgcc $CFLAGS $CPPFLAGS $LDFLAGS cdriver.c $LDLIBS -o driver

Extending this further, you might have several libraries in one or more locations. In that case, list all of the '-l' flags in the LDLIBS statement, for example,

export LDLIBS='-lcstat -lcpoly'

and all of the '-L' flags in the LDFLAGS statement. (The order in which the '-L' directories appear in LDFLAGS determines the search order.)

Using centrally supplied libraries (ACML, MKL, FFTW, etc.)

This extends the previous section's example by illustrating how to use VALET's vpkg_devrequire command to locate and link a system-supplied library, such as AMD's Core Math Library, ACML. Several releases (versions) of a library may be installed, and some may have been compiled with several compiler suites.

To view your choices, use VALET's vpkg_versions command:

vpkg_versions acml

The example below uses the acml/5.0.0-pgi vpkg_devrequire acml/5.0.0-pgi-fma4 version, the single-threaded, ACML 5.0.0 FMA4 library compiled with the PGI 11 compilers. Since that version depends on the PGI 11 compiler suite,

vpkg_devrequire acml/5.0.0-pgi-fma4 pgi

jointly sets the UNIX environment for both ACML and the PGI compiler suite. Therefore, you should not also issue a vpkg_require pgi command.

Unlike vpkg_require, vpkg_devrequire also modifies key environment variables including LDFLAGS and CPPFLAGS.

Putting it all together, the complete example using the library named acml, is:

vpkg_devrequire acml/5.0.0-pgi-fma4 pgi
export CFLAGS='-fpic'
export LDLIBS='-lacml'
pgcc $CFLAGS $CPPFLAGS $LDFLAGS cdriver.c $LDLIBS -o driver

Note that, $CPPFLAGS and $LDFLAGS must be in the compile statement even though the export CPPFLAGS and export LDFLAGS statement didn't appear above. The vpkg_devrequire command above defined and exported CPPFLAGS and LDFLAGS and their values.

Using user-supplied libraries and centrally supplied libraries together

The final example illustrates how to use your cstat and cpoly libraries (both in $HOME/lib) with the acml library:

vpkg_devrequire acml/5.0.0-pgi-fma4 pgi
export CFLAGS='-fpic'
export CPPFLAGS='$CPPFLAGS $HOME/inc'
export LDFLAGS='-L$HOME/lib $LDFLAGS'
export LDLIBS='-lacml -lcstat -lcpoly'
pgcc $CFLAGS $CPPFLAGS $LDFLAGS cdriver.c $LDLIBS -o driver

Remember that the library search order depends on the order of the LDFLAGS libraries.

  • abstract/darwin/app_dev/prog_env.txt
  • Last modified: 2022-08-30 10:11
  • by anita