Alterations to the workgroup Command
Users of the IT-RCI clusters have a primary gid (group id) of everyone (or 900), but when submitting and executing jobs the user needs an effective gid other than that. The workgroup
command is used to execute a command or spawn a new shell that has an effective gid of a secondary group of which the user is a member. For example, a member of the it_nss
workgroup on Caviness can do the following:
[user@login00 ~]$ id -gn everyone [user@login00 ~]$ workgroup -g it_nss [(it_nss:user)@login00 ~]$ id -gn it_nss
At login the effective gid was everyone. After issuing the workgroup
command, the new shell has effective gid of it_nss – which is reflected in the shell prompt, as well.
Issues
One issue with the workgroup
command is in the inheritance of the environment. The Bash shell environment includes unexported variables, aliases, array-valued variables, and functions that are not a part of the POSIX environment that sub-processes will inherit:
[user@login00 ~]$ bash_function() { echo "OK"; } [user@login00 ~]$ declare -a var_array [user@login00 ~]$ var_array+=(1 2 3) [user@login00 ~]$ bash_function OK [user@login00 ~]$ echo ${var_array[@]} 1 2 3 [user@login00 ~]$ workgroup -g it_nss [(it_nss:user)@login00 ~]$ bash_function bash: bash_function: command not found... [(it_nss:user)@login00 ~]$ echo ${var_array[@]} [(it_nss:user)@login00 ~]$
A tool like VALET that makes alterations to the current shell's environment may edit standard variables like PATH
– which will carry over into the workgroup shell – but may also introduce Bash-specific entities that do not. This leaves the workgroup shell in an odd hybrid state:
[user@login00 ~]$ vpkg_require r/default Adding dependency `gcc/4.9.4` to your environment Adding dependency `atlas/3.10.3` to your environment Adding package `r/3.5.1` to your environment [user@login00 ~]$ R-search Library Valet Package R Versn -------------------- ----------------------- ------- : [user@login00 ~]$ workgroup -g it_nss [(it_nss:user)@login00 ~]$ R-search bash: R-search: command not found... [(it_nss:user)@login00 ~]$ vpkg_require r/default [(it_nss:user)@login00 ~]$ R-search bash: R-search: command not found... [(it_nss:user)@login00 ~]$ vpkg_rollback ERROR: no environment snapshots for the current shell, unable to roll back
The R-search
command is an alias, so it does not make it to the workgroup shell. The environment variables do, though, and so VALET sees r/default
has already been loaded and will not do it again. Since the new shell has a different VALET identity no snapshots are present to allow vpkg_rollback
to remove the changes.
In the past users were cautioned to use the workgroup
command prior to introducing any packages into the runtime environment:
[(it_nss:user)@login00 ~]$ exit [user@login00 ~]$ vpkg_rollback all [user@login00 ~]$ /opt/shared/workgroup/bin/workgroup -g it_nss [(it_nss:user)@login00 ~]$ vpkg_require r/default Adding dependency `gcc/4.9.4` to your environment Adding dependency `atlas/3.10.3` to your environment Adding package `r/3.5.1` to your environment [(it_nss:user)@login00 ~]$ R-search Library Valet Package R Versn -------------------- ----------------------- ------- :
Ideally, having the workgroup
command start the new shell with a pristine environment devoid of all modifications, augmented by the standard login scripts (e.g. ~/.bash_profile
), would be far more useful.
Solution
To get around the standard behavior of a subprocess' inheriting the POSIX environment variables of its parent, the env
command can be used:
NAME env - run a program in a modified environment SYNOPSIS env [OPTION]... [-] [NAME=VALUE]... [COMMAND [ARG]...] DESCRIPTION Set each NAME to VALUE in the environment and run COMMAND. Mandatory arguments to long options are mandatory for short options too. -i, --ignore-environment start with an empty environment
Rather than having workgroup
execute the newgrp
command directly, it can be executed indirectly by the env
command:
/usr/bin/env -i /usr/bin/newgrp - it_nss
A modified version of the workgroup
command was produced, and minor alterations were made to the Caviness cluster's login scripts to ensure the workgroup prompt is still set as expected. To test:
[user@login00 ~]$ vpkg_require r/default Adding dependency `gcc/4.9.4` to your environment Adding dependency `atlas/3.10.3` to your environment Adding package `r/3.5.1` to your environment [user@login00 ~]$ workgroup -g it_nss [(it_nss:user)@login00 ~]$ which R /usr/bin/which: no R in (/opt/shared/workgroup/20200723/bin:/home/1001/bin:/opt/shared/valet/2.1/bin/bash:/opt/shared/valet/2.1/bin:/opt/shared/slurm/add-ons/bin:/opt/shared/slurm/bin:/usr/lib64/qt-3.3/bin:/opt/shared/gqueue/bin:/usr/local/bin:/usr/bin:/usr/local/sbin:/usr/sbin) [(it_nss:user)@login00 ~]$ vpkg_history [(it_nss:user)@login00 ~]$ vpkg_require r/default Adding dependency `gcc/4.9.4` to your environment Adding dependency `atlas/3.10.3` to your environment Adding package `r/3.5.1` to your environment [(it_nss:user)@login00 ~]$ which R /opt/shared/r/3.5.1/bin/R
The workgroup shell starts with a clean environment: the vpkg_require r/default
from the original shell did not carry over into the workgroup shell. The workgroup shell's prompt is working as before. This is the desired behavior.
workgroup
is used to execute commands (using the -C
flag). In that mode, all variables in the parent shell will be passed to the command when it is executed.
Implementation
Updates to the code have been pushed to the official repository. The updated version of the workgroup
command is available under /opt/shared/workgroup/20200723/bin
on Caviness and Farber for testing.
Timeline
Date | Time | Goal/Description |
---|---|---|
2020-07-23 | Authoring of this document | |
2020-07-30 | 09:00 | Activation on Caviness and Farber |