Enhanced QLogin
Applications like Matlab or Mathematica with graphical user interfaces present additional challenges in cluster deployment scenarios. Since remote connections are usually not made directly to the compute nodes but NATed through the cluster head node or a router, passing the side-band X11 traffic is challenging. Luckily, SSH contains such functionality. If a user ssh
'es to a cluster head node from his desktop, a second ssh
from there to a compute node will successfully tunnel X11 traffic from the compute node, back to the head node, and from there back to the user's desktop.
The problem comes when you introduce Grid Engine for interactive job scheduling via the qlogin
command. With the Mills cluster here at UD, from day one the behavior of qlogin
was modified to use a script we provided versus the default; from qconf -sconf
:
: qlogin_command /opt/shared/GridEngine/local/qlogin_ssh qlogin_daemon /usr/sbin/sshd -i rlogin_command builtin rlogin_daemon builtin rsh_command builtin rsh_daemon builtin :
On the compute node the standard SSH daemon is used to accept the qlogin
connection from the head node. On the head node, the qlogin
connection is made by the /opt/shared/GridEngine/local/qlogin_ssh
script:
#!/bin/sh HOST=$1 PORT=$2 # # Ensure that the environment on the remote host will # match the working env here: # export SGE_QLOGIN_PWD=`/bin/pwd` if [ -z "$WORKGROUP" ]; then export WORKGROUP=`id -g -n` fi if [ -z "$WORKDIR" ]; then WORKDIR=`/opt/shared/workgroup/bin/workdir -g $WORKGROUP` if [ $? = 0 ]; then export WORKDIR fi fi if [ "x$DISPLAY" = "x" ]; then exec /usr/bin/ssh -p $PORT $HOST else exec /usr/bin/ssh -X -Y -p $PORT $HOST fi
Grid Engine passes two arguments to the script: the hostname of the compute node and a TCP/IP port to use for the session. The SGE_QLOGIN_PWD
variable is set to the working directory when the qlogin
command was issued – we'll see why in a moment. If DISPLAY
is set, we assume that there is an X11 session active for the user and we tunnel it to the compute node; otherwise, a standard SSH session is opened. Note that we're using exec
because there is nothing else for this script to do after opening the connection.
Acting More Like QRsh
The standard behavior of this qlogin
is thus to open a connection to a compute node and (as usual for ssh
) leaves the user in his home directory and running under his default Unix group (on Mills, everyone
). Unfortunately, on Mills the idea is to have people transition into secondary groups in order to submit jobs to Grid Engine. The default qrsh
under Grid Engine propagates the current Unix group to the compute node (if the sysadmin wants it to), so the user's compute node environment is more similar to that which was on the head node.
The nature of the work environment being promoted on Mills dictates that qlogin
sessions would work best if the remote environment:
- uses the same Unix group that was active on the head node
- uses the same working directory that was active on the head node
To accomplish this, we first need to modify what environment variables ssh
will pass to the compute node. On the head node, the following was added to /etc/ssh/ssh_config
:
: SendEnv XMODIFIERS SendEnv WORKGROUP WORKDIR SGE_QLOGIN_PWD
Likewise, the SSH daemon on the compute nodes must be configured to accept those variables in the remote environment:
: AcceptEnv XMODIFIERS AcceptEnv WORKGROUP WORKDIR SGE_QLOGIN_PWD :
With these three variables being passed to the compute node, a /etc/profile.d
script can detect their presence and react accordingly by changing the working directory to SGE_QLOGIN_PWD
and possibly changing the group associated with the process using the value of WORKGROUP
and the workgroup
command available on Mills (similar to newgrp
).