seqno
attribute in a queue's configurationOne feature of Grid Engine I've only now discovered and tested is queue subordination. All existing clusters I've managed have used (in retrospect) a flat set of queues that were differentiated merely by the hosts on which instances were created or by available parallel environments. Recently I experimented (successfully) with leveraging the queue sequence number1) and slot-limiting resource quotas to partition the full complement of nodes equally among threaded and distributed parallel programs.
A proposed feature of the new UD community cluster is the sharing of idle cores, with those cores' owner(s) granted preferential access at any time. So while Professor X is writing his latest paper and not using his cores, Magneto can submit a job targeting idle resources and utilize those cores. However, if Professor X needs to rerun a few computations to satisfy a peer reviewer of his paper, Magneto's jobs will be killed2) to make cores available for Professor X.
Under this scheme, an idle queue spans all cluster resources, while one or more owner queues apply to specific nodes purchased by an investing entity. The idle queue is configured to be subordinate to all owner queues. The subordination is defined on a per-host basis, with a threshold indicating at what point idle jobs must make way for owner jobs:
qname: profx_3day.q hostlist: @profx.hosts : subordinate: slots=NONE,[@profx.hosts=slots=2(idle.q:0:sr)] :
This subordinate
directive states the following:
@profx.hosts
host list) is greater than 23)idle.q
queueprofx_3day.q
and requires less-than or equal-to the number of slots coming from idle.q
then begin suspending jobs running on that host via idle.q
, starting with shortest accumulated runtime. By default, Grid Engine suspends a task by sending the SIGSTOP
signal; the task can later resume execution by means of SIGCONT
. This scheme does not evict the task from memory on the execution host and will not work properly for distributed parallel programs. It also precludes any possibility of "migrating" the evicted task(s) onto other available resources.
I setup two test queues on Strauss: 3day.q
stands-in for an owner queue, and idle.q
is the all-access idle queue. Both queues are present on a single host and have two execution slots. Suppose there is a single core being used by Professor X, and a single core by Magneto:
[frey@strauss ~]$ qstat job-ID prior name user state submit/start at queue slots ja-task-ID ----------------------------------------------------------------------------------------------------------------- 34 0.55500 x.qs magneto r 11/02/2011 15:01:34 idle.q@strauss.udel.edu 1 36 0.55500 a.qs profx r 11/02/2011 15:01:49 3day.q@strauss.udel.edu 1
Magneto is hungry for CPU time, so he submits two additional jobs on the idle queue:
[magneto@strauss ~]$ qsub -q idle.q y.qs Your job 37 ("x.qs") has been submitted [magneto@strauss ~]$ qsub -q idle.q z.qs Your job 38 ("x.qs") has been submitted [magneto@strauss ~]$ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- 3day.q@strauss.udel.edu BIP 0/1/2 0.54 sol-sparc64 36 0.55500 a.qs profx r 11/02/2011 15:01:49 1 --------------------------------------------------------------------------------- idle.q@strauss.udel.edu BIP 0/1/2 0.54 sol-sparc64 P 34 0.55500 x.qs magneto r 11/02/2011 15:01:34 1 ############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 37 0.55500 y.qs magneto qw 11/02/2011 15:02:02 1 38 0.55500 z.qs magneto qw 11/02/2011 15:02:03 1
The idle.q
instance now shows state P
– overload – exists. This state is produced by the subordinate
clause that was added to the configuration for 3day.q
: addition of another job to the idle queue instance would exceed the threshold. So the jobs must wait.
Suddenly, Professor X finds that the input to one of his tasks was incorrect, and he must recalculate one figure for his paper. He submits a job:
[profx@strauss ~]$ qsub -q 3day.q b.qs Your job 39 ("x.qs") has been submitted [profx@strauss ~]$ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- 3day.q@strauss.udel.edu BIP 0/2/2 0.49 sol-sparc64 36 0.55500 a.qs profx r 11/02/2011 15:01:49 1 39 0.55500 b.qs profx r 11/02/2011 15:04:19 1 --------------------------------------------------------------------------------- idle.q@strauss.udel.edu BIP 0/0/2 0.49 sol-sparc64 P ############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 37 0.55500 y.qs magneto qw 11/02/2011 15:02:02 1 38 0.55500 z.qs magneto qw 11/02/2011 15:02:03 1
Ah! Magneto's x.qs
job has been evicted from idle.q
on the host. Since idle.q
was reconfigured to send SIGKILL
instead of SIGSTOP
, the offending job was outright terminated to make room for the owner's work.
We fast-forward several hours, and Professor X's a.qs
job has completed:
[frey@strauss ~]$ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- 3day.q@strauss.udel.edu BIP 0/1/2 0.48 sol-sparc64 39 0.55500 b.qs profx r 11/02/2011 15:04:19 1 --------------------------------------------------------------------------------- idle.q@strauss.udel.edu BIP 0/1/2 0.48 sol-sparc64 P 37 0.55500 y.qs magneto r 11/02/2011 15:05:19 1 ############################################################################ - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS - PENDING JOBS ############################################################################ 38 0.55500 z.qs magneto qw 11/02/2011 15:02:03 1
This has opened-up a slot in idle.q
which the waiting y.qs
job consumes. Once the other job owned by Professor X completes:
[frey@strauss ~]$ qstat -f queuename qtype resv/used/tot. load_avg arch states --------------------------------------------------------------------------------- 3day.q@strauss.udel.edu BIP 0/0/2 0.46 sol-sparc64 --------------------------------------------------------------------------------- idle.q@strauss.udel.edu BIP 0/2/2 0.46 sol-sparc64 P 37 0.55500 y.qs magneto r 11/02/2011 15:05:19 1 38 0.55500 z.qs magneto r 11/02/2011 15:06:04 1
the idle queue can be fully utilized by Magneto.
It is not immediately clear what interplay will manifest between slot-based resource quotas and subordination. Likewise, the subordination threshold should be summed across all N owner queues, where generally N is greater than one. The behavior of 3-day, 1-day, and indeterminate job length queues with idle subordination needs some careful testing.