NAMD on Caviness

Open MPI Slurm job submission script should be used for NAMD jobs on Caviness and can be found in /opt/shared/templates/slurm/generic/mpi/openmpi. Copy and edit the template based on your job requirements by following the comments described in the openmpi.qs file.

$ vpkg_versions namd
 
Available versions in package (* = default version):
 
[/opt/shared/valet/2.1/etc/namd.vpkg_yaml]
namd                 Scalable Molecular Dynamics
  2.12               Version 2.12
* 2.13               Version 2.13
  2.13:gpu           Version 2.13 (with CUDA support)
  2.14               compiled with Intel 2020, Open MPI 4.1.4
  3.0b3              compiled with Intel 2020, Open MPI 4.1.4
  3.0b3:cuda-11.3.1  compiled with Intel 2020, CUDA 11
  3.0b3:cuda-12.1.1  compiled with Intel 2020, CUDA 12

The * version is loaded by default when using vpkg_require namd. Make sure you select a GPU variant of the namd package if you plan to use GPUs, i.e. vpkg_require namd:gpu and provide the correct options to namd in the job script

${UD_MPIRUN} namd2 +idlepoll +p${SLURM_CPUS_ON_NODE} +devices ${CUDA_VISIBLE_DEVICES} ...

Documentation for namd indicates +idlepoll must always be used for runs using CUDA devices. Slurm sets CUDA_VISIBLE_DEVICES to the device indices your job was granted, and SLURM_CPUS_ON_NODE to the number of CPUs granted to you. Also ${UD_MPIRUN} is setup as part of the job script template provided in /opt/shared/templates/slurm/generic/mpi/openmpi/openmpi.qs file.

It is always a good idea to periodically check if the templates in /opt/shared/templates/slurm have changed especially as we learn more about what works well on a particular cluster.

Using ApoA1 as an example, the scaling results are presented. The performance improved with increasing CPU and GPU numbers.

vpkg_require namd/3.0b3
charmrun namd3  +p$SLURM_NTASKS  apoa1.namd > apoa1.log

vpkg_require namd/3.0b3:cuda-12.1.1
charmrun namd3 +idlepoll +p$SLURM_CPUS_PER_TASK +devices $CUDA_VISIBLE_DEVICES apoa1.namd > apoa1.log

NAMD on Caviness

Batch job

Scaling

hpc documentation