technical:gridengine:enhanced-qlogin

Enhanced QLogin

Applications like Matlab or Mathematica with graphical user interfaces present additional challenges in cluster deployment scenarios. Since remote connections are usually not made directly to the compute nodes but NATed through the cluster head node or a router, passing the side-band X11 traffic is challenging. Luckily, SSH contains such functionality. If a user ssh'es to a cluster head node from his desktop, a second ssh from there to a compute node will successfully tunnel X11 traffic from the compute node, back to the head node, and from there back to the user's desktop.

The problem comes when you introduce Grid Engine for interactive job scheduling via the qlogin command. With the Mills cluster here at UD, from day one the behavior of qlogin was modified to use a script we provided versus the default; from qconf -sconf:

  :
qlogin_command               /opt/shared/GridEngine/local/qlogin_ssh
qlogin_daemon                /usr/sbin/sshd -i
rlogin_command               builtin
rlogin_daemon                builtin
rsh_command                  builtin
rsh_daemon                   builtin
  :

On the compute node the standard SSH daemon is used to accept the qlogin connection from the head node. On the head node, the qlogin connection is made by the /opt/shared/GridEngine/local/qlogin_ssh script:

#!/bin/sh

HOST=$1
PORT=$2

#
# Ensure that the environment on the remote host will
# match the working env here:
#
export SGE_QLOGIN_PWD=`/bin/pwd`
if [ -z "$WORKGROUP" ]; then
  export WORKGROUP=`id -g -n`
fi
if [ -z "$WORKDIR" ]; then
  WORKDIR=`/opt/shared/workgroup/bin/workdir -g $WORKGROUP`
  if [ $? = 0 ]; then
    export WORKDIR
  fi
fi

if [ "x$DISPLAY" = "x" ]; then
  exec /usr/bin/ssh -p $PORT $HOST
else
  exec /usr/bin/ssh -X -Y -p $PORT $HOST
fi

Grid Engine passes two arguments to the script: the hostname of the compute node and a TCP/IP port to use for the session. The SGE_QLOGIN_PWD variable is set to the working directory when the qlogin command was issued – we'll see why in a moment. If DISPLAY is set, we assume that there is an X11 session active for the user and we tunnel it to the compute node; otherwise, a standard SSH session is opened. Note that we're using exec because there is nothing else for this script to do after opening the connection.

The standard behavior of this qlogin is thus to open a connection to a compute node and (as usual for ssh) leaves the user in his home directory and running under his default Unix group (on Mills, everyone). Unfortunately, on Mills the idea is to have people transition into secondary groups in order to submit jobs to Grid Engine. The default qrsh under Grid Engine propagates the current Unix group to the compute node (if the sysadmin wants it to), so the user's compute node environment is more similar to that which was on the head node.

The nature of the work environment being promoted on Mills dictates that qlogin sessions would work best if the remote environment:

  • uses the same Unix group that was active on the head node
  • uses the same working directory that was active on the head node

To accomplish this, we first need to modify what environment variables ssh will pass to the compute node. On the head node, the following was added to /etc/ssh/ssh_config:

  :
SendEnv XMODIFIERS
SendEnv WORKGROUP WORKDIR SGE_QLOGIN_PWD

Likewise, the SSH daemon on the compute nodes must be configured to accept those variables in the remote environment:

  :
AcceptEnv XMODIFIERS
AcceptEnv WORKGROUP WORKDIR SGE_QLOGIN_PWD
  :

With these three variables being passed to the compute node, a /etc/profile.d script can detect their presence and react accordingly by changing the working directory to SGE_QLOGIN_PWD and possibly changing the group associated with the process using the value of WORKGROUP and the workgroup command available on Mills (similar to newgrp).

  • technical/gridengine/enhanced-qlogin.txt
  • Last modified: 2012-08-08 09:53
  • by 127.0.0.1