====== Grid Engine: Exclusive Node Usage ====== The distributed parallel environments (e.g. Open MPI) on the Mills cluster are all configured to "fill up" nodes by allocating as many slots as possible before proceeding to another node to satisfy a job's slot requirement. In theory, one would expect that submitting four 24-core jobs to an empty set of queue instances backed by four 24-core nodes would send the four jobs to one node each. In practice, the complexity of Grid Engine job scheduling algorithms does not always yield ideal behavior: while the first job may land on a single node, the second, third, and fourth may wind up straddling the remaining three nodes. Often this behavior is frustrating to end users (since it is seemingly non-intuitive use of resources by a tool that should excel at that chore) as well as detrimental to the job's efficiency. What is desired in such cases is //exclusive use// of one or more nodes. For example, under the PBS system one can communicate more than just a total processor count to the job scheduler: … -l nodes=2:ppn=24 … indicating to the scheduler that the job wants to be allocated two full nodes and will run 24 worker processes per node (ppn). If the target queue is sub-typed as "exclusive" rather than "shared," then the job is eligible to run on any nodes with nothing else running on them, and will itself be the only thing allowed to run on the two nodes that are chosen -- hence, //exclusive use//. ===== Configuring Exclusive Access ===== Grid Engine 6.2u3 introduced a special //relational operator// in its resource-modelling system (complexes). The ''EXCL'' operator can be associated with a boolean complex. The behavior of this operator on a consumable boolean complex called //exclusive// is summarized below: ^value of //exclusive// ^ requested value of //exclusive// ^ comparison result^ |not specified|not specified|TRUE| |not specified|FALSE (0)|TRUE| |not specified|TRUE (1)|TRUE| |TRUE (1)|FALSE (0)|TRUE| |TRUE (1)|TRUE (1)|TRUE| |FALSE (0)|FALSE (0)|FALSE| |FALSE (0)|TRUE (1)|FALSE| If this complex is assigned a TRUE value on an execution host basis, then each execution host can: * concurrently run any number of jobs that do not request the ''exclusive=1'' complex * run AT MOST a single job that requests the ''exclusive=1'' complex Normal scheduling priorities will determine the order in which all jobs (regardless of exclusivity) contend for resources. A job that consumes the ''exclusive'' complex for a host will block any other jobs from making use of resources on that host -- even if there are free slots available. For example, if my job requests 26 slots and ''exclusive=1'' it will consume two (2) entire compute nodes. If both nodes have 24 cores, then this job will effectively be holding 22 cores "hostage" during the lifetime of its execution. When requesting exclusive usage of nodes, use core counts evenly divisible by 24 so that nodes are not left with an idle majority. In the example presented above, the additional 2 workers on the second node are not likely to contribute much of a speed-up to the program due to their requiring network operations to communicate with the majority of their peers. ===== Requesting Exclusive Use ===== Exclusive use is requested for a job either in its script header: : #$ -l exclusive=1 : or from the ''qsub'' command line: > qsub … -l exclusive=1 …