Singularity on Caviness

This is an old revision of the document!

There are several ways to get a Singularity container onto Caviness, but here are the two to consider using if reproducibility is important.

You can build a Singularity container from a Docker or Singularity Hub on Caviness.
You can build a container on your local system and copy it to Caviness.

Caviness does not support Docker, but you can build a Singularity container from an existing Docker container or from a Singularity Hub container regsitry. Starting on the head node, load the Singularity software into your environment using VALET, and build your container.

Docker Hub example

$ vpkg_require singularity
$ singularity build tensorflow.simg docker://tensorflow/tensorflow

Singularity Hub example

$ vpkg_require singularity
$ singularity build hello-world.simg shub://vsoch/hello-worldw

More details on this can be found in the Singularity User Guide.

One example of a container registry is https://biocontainers.pro/registry/#/.

You can build your Singularity container on your local system, installing the singularity program if you need to. For more information on how to do this, see the Singularity web site. Copy your container to Caviness using the usual file transfer methods. See the Transferring files to/from Caviness for more information.

Once your container is on Caviness you can run it. Containers must run on Caviness' compute nodes, not on the head node. Remember you must specify a workgroup before running any jobs on Caviness, then use either the salloc or sbatch command to get access to a compute node. See the Running Jobs on Caviness for more information on SLURM and using Caviness' compute nodes.

vpkg_require singularity

Then you can use the singularity commands to execute your container, like the example below.

[traine@login01 ~]$ workgroup -g it_css
[(it_css:traine)@login01 ~]$ salloc --ntasks=1 --cpus-per-task=4
salloc: Granted job allocation 844
salloc: Waiting for resource configuration
salloc: Nodes r00n45 are ready for job
[traine@r00n45 ~]$ vpkg_require singularity
Adding package `singularity/2.5.1` to your environment
[traine@r00n45 ~]$ singularity shell tensorflow.simg 
Singularity: Invoking an interactive shell within container...

Singularity tensorflow.simg:~> tensorboard --help

       USAGE: /usr/local/bin/tensorboard [flags]

Try --helpfull to get a list of all flags.

Singularity tensorflow.simg:~> grep Cpus_allowed_list /proc/$$/status
Cpus_allowed_list:	0-3
Singularity tensorflow.simg:~>

Locale error

If you experience the error below after starting a particular container, then it is likely due your current locale not being supported in this particular container's shell.

[traine@r00n45 ~]$ singularity shell tensorflow.simg
Singularity: Invoking an interactive shell within container...
 
Singularity tensorflow.simg:~> tensorboard --help
Traceback (most recent call last):
  File "/usr/local/bin/tensorboard", line 11, in <module>
    sys.exit(run_main())
  File "/usr/local/lib/python2.7/dist-packages/tensorboard/main.py", line 48, in run_main
    program.setup_environment()
  File "/usr/local/lib/python2.7/dist-packages/tensorboard/program.py", line 57, in setup_environment
    util.setup_logging()
  File "/usr/local/lib/python2.7/dist-packages/tensorboard/util.py", line 50, in setup_logging
    locale.setlocale(locale.LC_ALL, '')
  File "/usr/lib/python2.7/locale.py", line 581, in setlocale
    return _setlocale(category, locale)
locale.Error: unsupported locale setting

Once you start the container shell, check which locale's are supported by using the locale -a command:

Singularity tensorflow.simg:~> locale -a
locale: Cannot set LC_CTYPE to default locale: No such file or directory
locale: Cannot set LC_MESSAGES to default locale: No such file or directory
locale: Cannot set LC_COLLATE to default locale: No such file or directory
C
C.UTF-8
POSIX

Of course in this particular case, checking the setting of account traine it is set to en_US.UTF-8 which is not supported in this shell. To fix the situation, simply set the LANG locale to one that is supported before starting the container shell.

[tainne@r00n45 ~]$ LANG=C.UTF-8 singularity shell tensorflow.simg
Singularity: Invoking an interactive shell within container...
 
Singularity tensorflow.simg:~> tensorboard --help
usage: tensorboard [-h] [--logdir PATH] [--host ADDR] [--port PORT]
                   [--purge_orphaned_data BOOL] [--reload_interval SECONDS]
                   [--db URI] [--inspect] [--tag TAG] [--event_file PATH]
                   [--path_prefix PATH] [--window_title TEXT]
                   [--max_reload_threads COUNT]
                   [--samples_per_plugin SAMPLES_PER_PLUGIN]
                   [--master_tpu_unsecure_channel ADDR]
                   [--debugger_data_server_grpc_port PORT]
                   [--debugger_port PORT]
 
TensorBoard is a suite of web applications for inspectinng and understanding
your TensorFlow runs and graphs. https://github.com/tensorflow/tensorboard
 
optional arguments:
  -h, --help            show this help message and exit
  --logdir PATH         Directory where TensorBoard will look to find
                        TensorFlow event files that it can display.
                        TensorBoard will recursively walk the directory
                        structure rooted at logdir, looking for .*tfevents.*
                        files. You may also pass a comma separated list of log
                        directories, and TensorBoard will watch each
                        directory. You can also assign names to individual log
                        directories by putting a colon between the name and
                        the path, as in: `tensorboard
                        --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2`
  --host ADDR           What host to listen to. Defaults to serving on all
                        interfaces. Other commonly used values are 127.0.0.1
                        (localhost) and :: (for IPv6).
  --port PORT           Port to serve TensorBoard on (default: 6006)
  --purge_orphaned_data BOOL
                        Whether to purge data that may have been orphaned due
                        to TensorBoard restarts. Setting
                        --purge_orphaned_data=False can be used to debug data
                        disappearance. (default: True)
  --reload_interval SECONDS
                        How often the backend should load more data, in
                        seconds. Set to 0 to load just once at startup and a
                        negative number to never reload at all. (default: 5.0)
  --db URI              [experimental] sets SQL database URI
  --inspect             Prints digests of event files to command line. This is
                        useful when no data is shown on TensorBoard, or the
                        data shown looks weird. Example usage: `tensorboard
                        --inspect --logdir mylogdir --tag loss` See
                        tensorflow/python/summary/event_file_inspector.py for
                        more info.
  --tag TAG             tag to query for; used with --inspect
  --event_file PATH     The particular event file to query for. Only used if
                        --inspect is present and --logdir is not specified.
  --path_prefix PATH    An optional, relative prefix to the path, e.g.
                        "/path/to/tensorboard". resulting in the new base url
                        being located at localhost:6006/path/to/tensorboard
                        under default settings. A leading slash is required
                        when specifying the path_prefix, however trailing
                        slashes can be omitted. The path_prefix can be
                        leveraged for path based routing of an elb when the
                        website base_url is not available e.g.
                        "example.site.com/path/to/tensorboard/".
  --window_title TEXT   changes title of browser window
  --max_reload_threads COUNT
                        The max number of threads that TensorBoard can use to
                        reload runs. Not relevant for db mode. Each thread
                        reloads one run at a time.
  --samples_per_plugin SAMPLES_PER_PLUGIN
                        An optional comma separated list of
                        plugin_name=num_samples pairs to explicitly specify
                        how many samples to keep per tag for that plugin. For
                        unspecified plugins, TensorBoard randomly downsamples
                        logged summaries to reasonable values to prevent out-
                        of-memory errors for long running jobs. This flag
                        allows fine control over that downsampling. Note that
                        0 means keep all samples of that type. For instance
                        "scalars=500,images=0" keeps 500 scalars and all
                        images. Most users should not need to set this flag.
 
profile plugin:
  --master_tpu_unsecure_channel ADDR
                        IP address of "master tpu", used for getting streaming
                        trace data through tpu profiler analysis grpc. The
                        grpc channel is not secured.
 
debugger plugin:
  --debugger_data_server_grpc_port PORT
                        The port at which the non-interactive debugger data
                        server should receive debugging data via gRPC from one
                        or more debugger-enabled TensorFlow runtimes. No
                        debugger plugin or debugger data server will be
                        started if this flag is not provided. This flag
                        differs from the `--debugger_port` flag in that it
                        starts a non-interactive mode. It is for use with the
                        "health pills" feature of the Graph Dashboard. This
                        flag is mutually exclusive with `--debugger_port`.
  --debugger_port PORT  The port at which the interactive debugger data server
                        (to be started by the debugger plugin) should receive
                        debugging data via gRPC from one or more debugger-
                        enabled TensorFlow runtimes. No debugger plugin or
                        debugger data server will be started if this flag is
                        not provided. This flag differs from the
                        `--debugger_data_server_grpc_port` flag in that it
                        starts an interactive mode that allows user to pause
                        at selected nodes inside a TensorFlow Graph or between
                        Session.runs. It is for use with the interactive
                        Debugger Dashboard. This flag is mutually exclusive
                        with `--debugger_data_server_grpc_port`.
Singularity tensorflow.simg:~>

Some common commands are listed here. For more information about Singularity, see the Singularity web site.

`shell`	Start a shell within your container using the operating system you have set up your container to use.
`exec`	Run a single command within your container.
`run`	Run a recipe script you have set up within your container. Using a recipe script forces users of your container to use a pre-established workflow.
`help`	Provides help on Singularity.

Singularity on Caviness

Build a Singularity container

Build a Singularity container from Docker or Singularity Hub Registry

Docker Hub example

Singularity Hub example

Build a container on your local system

Execute your Singularity container through SLURM

Inside your interactive session or your batch job you must first issue the command

Troubleshooting

Locale error

Common Singularity commands

hpc documentation