Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision | |||
technical:generic:gaussian-16-on-ampere-gpus [2023-05-26 11:52] – frey | technical:generic:gaussian-16-on-ampere-gpus [2023-05-26 11:53] (current) – frey | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Compilation of Gaussian '16 for NVIDIA Ampere GPUs ====== | ||
+ | <WRAP center round important 60%> | ||
+ | In compliance with Gaussian licensing, this article includes no reproduction of Gaussian '16 source code or build files. | ||
+ | </ | ||
+ | |||
+ | |||
+ | The third-generation expansion to UD's Caviness cluster includes Intel Xeon Gold 6240R processors and NVIDIA A100 and A40 GPUs. The CPUs are significantly newer than the Haswell generation that is the maximum architectural optimization level the Gaussian '16 build system will target. | ||
+ | |||
+ | < | ||
+ | [frey@r06g00 ~]$ PGI_ACC_DEBUG=1 openacc-test/ | ||
+ | | ||
+ | ACC: detected 4 CUDA devices | ||
+ | ACC: initialized 0 CUDA devices | ||
+ | ACC: device[1] is PGI native | ||
+ | ACC: device[0] is PGI native | ||
+ | pinitialize for thread 1 | ||
+ | | ||
+ | 0 | ||
+ | | ||
+ | pgi_uacc_set_device_num(devnum=0, | ||
+ | | ||
+ | curr_devid for thread 1 is 0 | ||
+ | 0 | ||
+ | | ||
+ | curr_devid for thread 1 is 0 | ||
+ | 0 | ||
+ | </ | ||
+ | |||
+ | The PGI 17 OpenACC runtime detects the A100's but cannot use them. The problem manifests in Gaussian runs as the following message: | ||
+ | |||
+ | < | ||
+ | : | ||
+ | | ||
+ | | ||
+ | | ||
+ | Error termination via Lnk1e in / | ||
+ | Job cpu time: 0 days 1 hours 5 minutes 56.1 seconds. | ||
+ | | ||
+ | </ | ||
+ | |||
+ | When Gaussian calls '' | ||
+ | |||
+ | Using the NVIDIA HPC SDK 22.7 compiler suite -- which is a rebrand of PGI after NVIDIA purchased it -- the Ampere GPUs are now usable by the OpenACC runtime: | ||
+ | |||
+ | < | ||
+ | [frey@r06g00 ~]$ PGI_ACC_DEBUG=1 openacc-test/ | ||
+ | | ||
+ | ACC: detected 4 CUDA devices | ||
+ | cuda_initdev thread:0 data.default_device_num: | ||
+ | ACC: device[1] is NVIDIA CUDA device 0 compute capability 8.0 | ||
+ | ACC: device[2] is NVIDIA CUDA device 1 compute capability 8.0 | ||
+ | ACC: device[3] is NVIDIA CUDA device 2 compute capability 8.0 | ||
+ | ACC: device[4] is NVIDIA CUDA device 3 compute capability 8.0 | ||
+ | ACC: initialized 4 CUDA devices | ||
+ | ACC: device[5] is PGI native | ||
+ | pinitialize (threadid=1) | ||
+ | cuda_init_device thread:1 data.default_device_num: | ||
+ | cuda_init_device(threadid=1, | ||
+ | cuda_init_device(threadid=1, | ||
+ | cuda_init_device(threadid=1, | ||
+ | argument memory for queue 32 device: | ||
+ | | ||
+ | 4 | ||
+ | | ||
+ | pgi_uacc_set_device_num(devnum=0, | ||
+ | pgi_uacc_set_device_num(devnum=0, | ||
+ | | ||
+ | 84592361472 | ||
+ | | ||
+ | 85031714816 | ||
+ | </ | ||
+ | |||
+ | In order to produce Gaussian '16 binaries that can make use of the A100 and A40 GPUs in Caviness, the build system and source code must be altered. | ||
+ | |||
+ | ===== Add Skylake Target ===== | ||
+ | |||
+ | Since we are making changes anyway, directives to target Skylake-generation Intel CPUs were added to the build system. | ||
+ | |||
+ | The '' | ||
+ | |||
+ | ==== Flags ==== | ||
+ | |||
+ | The flags for Haswell were reused with a target processor of " | ||
+ | |||
+ | ==== BLAS/LAPACK ==== | ||
+ | |||
+ | The Haswell-optimized ATLAS library included with the Gaussian '16 source is used for the '' | ||
+ | |||
+ | <WRAP center round tip 60%> | ||
+ | Substitution of the Intel MKL (serial) library for the ATLAS BLAS/LAPACK is another optimization variant we might try. That does represent a very significant change that would require validation before using the binaries in production computation. | ||
+ | </ | ||
+ | |||
+ | ===== Add Ampere OpenACC Target ===== | ||
+ | |||
+ | The token '' | ||
+ | |||
+ | ==== Flags ==== | ||
+ | |||
+ | NVIDIA HPC SDK 22.7 flags that enable the necessary functionality are: | ||
+ | |||
+ | < | ||
+ | … -cuda -acc=gpu, | ||
+ | </ | ||
+ | |||
+ | ==== OpenACC Changes ==== | ||
+ | |||
+ | The OpenACC in NVIDIA HPC SDK 22.7 differs from the version present in PGI 17: | ||
+ | |||
+ | * The syntax '' | ||
+ | * The syntax '' | ||
+ | * The syntax '' | ||
+ | * The '' |