The distributed parallel environments (e.g. Open MPI) on the Mills cluster are all configured to "fill up" nodes by allocating as many slots as possible before proceeding to another node to satisfy a job's slot requirement. In theory, one would expect that submitting four 24-core jobs to an empty set of queue instances backed by four 24-core nodes would send the four jobs to one node each. In practice, the complexity of Grid Engine job scheduling algorithms does not always yield ideal behavior: while the first job may land on a single node, the second, third, and fourth may wind up straddling the remaining three nodes. Often this behavior is frustrating to end users (since it is seemingly non-intuitive use of resources by a tool that should excel at that chore) as well as detrimental to the job's efficiency.
What is desired in such cases is exclusive use of one or more nodes. For example, under the PBS system one can communicate more than just a total processor count to the job scheduler:
… -l nodes=2:ppn=24 …
indicating to the scheduler that the job wants to be allocated two full nodes and will run 24 worker processes per node (ppn). If the target queue is sub-typed as "exclusive" rather than "shared," then the job is eligible to run on any nodes with nothing else running on them, and will itself be the only thing allowed to run on the two nodes that are chosen – hence, exclusive use.
Grid Engine 6.2u3 introduced a special relational operator in its resource-modelling system (complexes). The EXCL
operator can be associated with a boolean complex. The behavior of this operator on a consumable boolean complex called exclusive is summarized below:
value of exclusive | requested value of exclusive | comparison result |
---|---|---|
not specified | not specified | TRUE |
not specified | FALSE (0) | TRUE |
not specified | TRUE (1) | TRUE |
TRUE (1) | FALSE (0) | TRUE |
TRUE (1) | TRUE (1) | TRUE |
FALSE (0) | FALSE (0) | FALSE |
FALSE (0) | TRUE (1) | FALSE |
If this complex is assigned a TRUE value on an execution host basis, then each execution host can:
exclusive=1
complexexclusive=1
complex
Normal scheduling priorities will determine the order in which all jobs (regardless of exclusivity) contend for resources. A job that consumes the exclusive
complex for a host will block any other jobs from making use of resources on that host – even if there are free slots available.
For example, if my job requests 26 slots and exclusive=1
it will consume two (2) entire compute nodes. If both nodes have 24 cores, then this job will effectively be holding 22 cores "hostage" during the lifetime of its execution.
Exclusive use is requested for a job either in its script header:
:
#$ -l exclusive=1
:
or from the qsub
command line:
> qsub … -l exclusive=1 …