technical:generic:gaussian-linda-integration

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

technical:generic:gaussian-linda-integration [2020-08-06 13:22] – created freytechnical:generic:gaussian-linda-integration [2020-08-06 14:23] (current) frey
Line 2: Line 2:
  
 This document contains a summary of the steps necessary to tightly-integrate Gaussian multi-node parallelism (TCPLinda) in our Slurm clusters. This document contains a summary of the steps necessary to tightly-integrate Gaussian multi-node parallelism (TCPLinda) in our Slurm clusters.
 +
 +===== Scheduling =====
 +
 +The current infrastructure for Gaussian jobs expects submissions like
 +
 +<code bash>
 +#SBATCH --nodes=1
 +#SBATCH --ntasks=1
 +#SBATCH --cpus-per-task=<C>
 +#SBATCH --mem=<M> OR --mem-per-cpu=<M>
 +</code>
 +
 +The allocated CPU core ids are used to construct a ''GAUSS_CDEF'' environment variable and the on-node memory limit -- minus fixed/per-core overhead -- produces a value for ''GAUSS_MDEF''.
 +
 +<note important>The infrastructure also sets ''GAUSS_GDEF'' if a GPU-enabled version of Gaussian is being used on a node with GPUs allocated to the job.</note>
 +
 +For multi-node Gaussian jobs, the submission profile would be expected to change to:
 +
 +<code bash>
 +#SBATCH --nodes=<N>
 +#SBATCH --ntasks-per-node=<T>
 +#SBATCH --cpus-per-task=<C>
 +#SBATCH --mem=<M> OR --mem-per-cpu=<M>
 +</code>
 +
 +Each //task// would equate with a TCPLinda worker, and each worker would have <C> cores for SMP parallelism.  The ''GAUSS_WDEF'' environment variable would need to be set accordingly; e.g. for ''SLURM_JOB_NODELIST=r00n[00-02]'', ''SLURM_TASKS_PER_NODE=2'', and ''SLURM_CPUS_PER_TASK=18'' the environment would contain:
 +
 +^Variable^Value^
 +|''GAUSS_WDEF''|''r00n00:2,r00n01:2,r00n02:2''|
 +|''GAUSS_CDEF''|Not set|
 +|''GAUSS_PDEF''|''=$SLURM_CPUS_PER_TASK''|
 +|''GAUSS_MDEF''|Memory per task - overhead FIXME|
 +
 +Since Slurm will not in general allocate the same <C> CPU core ids to each task, the now-deprecated ''GAUSS_PDEF'' CPU count must be used; otherwise, ''srun'' will communicate the same ''GAUSS_CDEF'' to each task and cores will fail to be bound.  If future versions of Gaussian remove the ''GAUSS_PDEF'' functionality entirely, it will be necessary either to allocate entire nodes (at 1 task-per-node) or the Linda worker startup will need to be wrapped by a script that reconfigures ''GAUSS_CDEF'' on-the-fly.  The latter is probably beneficial, as it would also allow ''GAUSS_MDEF'' and ''GAUSS_GDEF'' to be customized -- which would be extremely important for heterogenous job allocations.
 +
 +===== TCPLinda Worker Startup =====
 +
 +The ''linda_rsh'' script is used by TCPLinda to execute a remote command on a node participating in the job.  By default it uses ''rsh'' or ''ssh'' to connect to the remote host.  But on a Slurm cluster the ''srun'' command must be used for proper job containment and accounting.
 +
 +The ''linda_rsh'' script would require modification to **only** make use of ''srun'' Since TCPLinda executes one instance of ''linda_rsh'' per Linda task, the job's batch step would perform ((<N>*<T>)-1) invocations of ''linda_rsh'' Each such invocation will be made against a specific hostname (pulled from ''GAUSS_WDEF'').  The ''srun'' command would alter e.g. the ''ssh'' formula
 +
 +<code bash>
 +/usr/bin/ssh -x $host $user -n "$@"
 +</code>
 +
 +to
 +
 +<code bash>
 +srun --nodes=1 --ntasks=1 --cpus-per-task=${SLURM_CPUS_PER_TASK:-1} --nodelist=$host "$@"
 +</code>
 +
 +Each TCPLinda worker (beyond the primary Gaussian process in the batch step) would be a separate job step in the final accounting of the job.
  • technical/generic/gaussian-linda-integration.1596734567.txt.gz
  • Last modified: 2020-08-06 13:22
  • by frey