Table of Contents

Alterations to the workgroup Command

Users of the IT-RCI clusters have a primary gid (group id) of everyone (or 900), but when submitting and executing jobs the user needs an effective gid other than that. The workgroup command is used to execute a command or spawn a new shell that has an effective gid of a secondary group of which the user is a member. For example, a member of the it_nss workgroup on Caviness can do the following:

[user@login00 ~]$ id -gn
everyone
[user@login00 ~]$ workgroup -g it_nss
[(it_nss:user)@login00 ~]$ id -gn
it_nss

At login the effective gid was everyone. After issuing the workgroup command, the new shell has effective gid of it_nss – which is reflected in the shell prompt, as well.

Issues

One issue with the workgroup command is in the inheritance of the environment. The Bash shell environment includes unexported variables, aliases, array-valued variables, and functions that are not a part of the POSIX environment that sub-processes will inherit:

[user@login00 ~]$ bash_function() { echo "OK"; }
[user@login00 ~]$ declare -a var_array
[user@login00 ~]$ var_array+=(1 2 3)
[user@login00 ~]$ bash_function
OK
[user@login00 ~]$ echo ${var_array[@]}
1 2 3
 
[user@login00 ~]$ workgroup -g it_nss
 
[(it_nss:user)@login00 ~]$ bash_function
bash: bash_function: command not found...
[(it_nss:user)@login00 ~]$ echo ${var_array[@]}
 
[(it_nss:user)@login00 ~]$

A tool like VALET that makes alterations to the current shell's environment may edit standard variables like PATH – which will carry over into the workgroup shell – but may also introduce Bash-specific entities that do not. This leaves the workgroup shell in an odd hybrid state:

[user@login00 ~]$ vpkg_require r/default
Adding dependency `gcc/4.9.4` to your environment
Adding dependency `atlas/3.10.3` to your environment
Adding package `r/3.5.1` to your environment
[user@login00 ~]$ R-search
Library                Valet Package              R Versn
--------------------   -----------------------    -------
   :
 
[user@login00 ~]$ workgroup -g it_nss
[(it_nss:user)@login00 ~]$ R-search
bash: R-search: command not found...
[(it_nss:user)@login00 ~]$ vpkg_require r/default
[(it_nss:user)@login00 ~]$ R-search
bash: R-search: command not found...
[(it_nss:user)@login00 ~]$ vpkg_rollback
ERROR:  no environment snapshots for the current shell, unable to roll back

The R-search command is an alias, so it does not make it to the workgroup shell. The environment variables do, though, and so VALET sees r/default has already been loaded and will not do it again. Since the new shell has a different VALET identity no snapshots are present to allow vpkg_rollback to remove the changes.

In the past users were cautioned to use the workgroup command prior to introducing any packages into the runtime environment:

[(it_nss:user)@login00 ~]$ exit
[user@login00 ~]$ vpkg_rollback all
[user@login00 ~]$ /opt/shared/workgroup/bin/workgroup -g it_nss
[(it_nss:user)@login00 ~]$ vpkg_require r/default
Adding dependency `gcc/4.9.4` to your environment
Adding dependency `atlas/3.10.3` to your environment
Adding package `r/3.5.1` to your environment
[(it_nss:user)@login00 ~]$ R-search
Library                Valet Package              R Versn
--------------------   -----------------------    -------
   :

Ideally, having the workgroup command start the new shell with a pristine environment devoid of all modifications, augmented by the standard login scripts (e.g. ~/.bash_profile), would be far more useful.

Solution

To get around the standard behavior of a subprocess' inheriting the POSIX environment variables of its parent, the env command can be used:

NAME
       env - run a program in a modified environment

SYNOPSIS
       env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...]

DESCRIPTION
       Set each NAME to VALUE in the environment and run COMMAND.

       Mandatory arguments to long options are mandatory for short options too.

       -i, --ignore-environment
              start with an empty environment

Rather than having workgroup execute the newgrp command directly, it can be executed indirectly by the env command:

/usr/bin/env -i /usr/bin/newgrp - it_nss

A modified version of the workgroup command was produced, and minor alterations were made to the Caviness cluster's login scripts to ensure the workgroup prompt is still set as expected. To test:

[user@login00 ~]$ vpkg_require r/default
Adding dependency `gcc/4.9.4` to your environment
Adding dependency `atlas/3.10.3` to your environment
Adding package `r/3.5.1` to your environment
[user@login00 ~]$ workgroup -g it_nss
[(it_nss:user)@login00 ~]$ which R
/usr/bin/which: no R in (/opt/shared/workgroup/20200723/bin:/home/1001/bin:/opt/shared/valet/2.1/bin/bash:/opt/shared/valet/2.1/bin:/opt/shared/slurm/add-ons/bin:/opt/shared/slurm/bin:/usr/lib64/qt-3.3/bin:/opt/shared/gqueue/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin)
[(it_nss:user)@login00 ~]$ vpkg_history
 
[(it_nss:user)@login00 ~]$ vpkg_require r/default
Adding dependency `gcc/4.9.4` to your environment
Adding dependency `atlas/3.10.3` to your environment
Adding package `r/3.5.1` to your environment
[(it_nss:user)@login00 ~]$ which R
/opt/shared/r/3.5.1/bin/R

The workgroup shell starts with a clean environment: the vpkg_require r/default from the original shell did not carry over into the workgroup shell. The workgroup shell's prompt is working as before. This is the desired behavior.

Note that this is not the desired behavior when workgroup is used to execute commands (using the -C flag). In that mode, all variables in the parent shell will be passed to the command when it is executed.

Implementation

Updates to the code have been pushed to the official repository. The updated version of the workgroup command is available under /opt/shared/workgroup/20200723/bin on Caviness and Farber for testing.

Timeline

Date Time Goal/Description
2020-07-23 Authoring of this document
2020-07-3009:00Activation on Caviness and Farber