technical:gridengine:exclusive-alloc

Grid Engine: Exclusive Node Usage

The distributed parallel environments (e.g. Open MPI) on the Mills cluster are all configured to “fill up” nodes by allocating as many slots as possible before proceeding to another node to satisfy a job's slot requirement. In theory, one would expect that submitting four 24-core jobs to an empty set of queue instances backed by four 24-core nodes would send the four jobs to one node each. In practice, the complexity of Grid Engine job scheduling algorithms does not always yield ideal behavior: while the first job may land on a single node, the second, third, and fourth may wind up straddling the remaining three nodes. Often this behavior is frustrating to end users (since it is seemingly non-intuitive use of resources by a tool that should excel at that chore) as well as detrimental to the job's efficiency.

What is desired in such cases is exclusive use of one or more nodes. For example, under the PBS system one can communicate more than just a total processor count to the job scheduler:

  … -l nodes=2:ppn=24 …

indicating to the scheduler that the job wants to be allocated two full nodes and will run 24 worker processes per node (ppn). If the target queue is sub-typed as “exclusive” rather than “shared,” then the job is eligible to run on any nodes with nothing else running on them, and will itself be the only thing allowed to run on the two nodes that are chosen – hence, exclusive use.

Grid Engine 6.2u3 introduced a special relational operator in its resource-modelling system (complexes). The EXCL operator can be associated with a boolean complex. The behavior of this operator on a consumable boolean complex called exclusive is summarized below:

value of exclusive requested value of exclusive comparison result
not specifiednot specifiedTRUE
not specifiedFALSE (0)TRUE
not specifiedTRUE (1)TRUE
TRUE (1)FALSE (0)TRUE
TRUE (1)TRUE (1)TRUE
FALSE (0)FALSE (0)FALSE
FALSE (0)TRUE (1)FALSE

If this complex is assigned a TRUE value on an execution host basis, then each execution host can:

  • concurrently run any number of jobs that do not request the exclusive=1 complex
  • run AT MOST a single job that requests the exclusive=1 complex

Normal scheduling priorities will determine the order in which all jobs (regardless of exclusivity) contend for resources. A job that consumes the exclusive complex for a host will block any other jobs from making use of resources on that host – even if there are free slots available.

For example, if my job requests 26 slots and exclusive=1 it will consume two (2) entire compute nodes. If both nodes have 24 cores, then this job will effectively be holding 22 cores “hostage” during the lifetime of its execution.

When requesting exclusive usage of nodes, use core counts evenly divisible by 24 so that nodes are not left with an idle majority. In the example presented above, the additional 2 workers on the second node are not likely to contribute much of a speed-up to the program due to their requiring network operations to communicate with the majority of their peers.

Exclusive use is requested for a job either in its script header:

  :
#$ -l exclusive=1
  :

or from the qsub command line:

> qsub … -l exclusive=1 …
  • technical/gridengine/exclusive-alloc.txt
  • Last modified: 2012-08-08 09:55
  • by 127.0.0.1