This is an old revision of the document!
Singularity on Caviness
Build a Singularity container
There are several ways to get a Singularity container onto Caviness, but here are the two to consider using if reproducibility is important.
- You can build a Singularity container from a Docker or Singularity Hub on Caviness.
- You can build a container on your local system and copy it to Caviness.
Build a Singularity container from Docker or Singularity Hub Registry
Caviness does not support Docker, but you can build a Singularity container from an existing Docker container or from a Singularity Hub container regsitry. Starting on the head node, load the Singularity software into your environment using VALET, and build your container.
Docker Hub example
$ vpkg_require singularity $ singularity build tensorflow.simg docker://tensorflow/tensorflow
Singularity Hub example
$ vpkg_require singularity $ singularity build hello-world.simg shub://vsoch/hello-worldw
More details on this can be found in the Singularity User Guide.
One example of a container registry is https://biocontainers.pro/registry/#/.
Build a container on your local system
You can build your Singularity container on your local system, installing the singularity program if you need to. For more information on how to do this, see the Singularity web site. Copy your container to Caviness using the usual file transfer methods. See the Transferring files to/from Caviness for more information.
Execute your Singularity container through SLURM
Once your container is on Caviness you can run it. Containers must run on Caviness' compute nodes, not on the head node. Remember you must specify a workgroup before running any jobs on Caviness, then use either the salloc
or sbatch
command to get access to a compute node. See the Running Jobs on Caviness for more information on SLURM and using Caviness' compute nodes.
Inside your interactive session or your batch job you must first issue the command
vpkg_require singularity
Then you can use the singularity commands to execute your container, like the example below.
[traine@login01 ~]$ workgroup -g it_css [(it_css:traine)@login01 ~]$ salloc --ntasks=1 --cpus-per-task=4 salloc: Granted job allocation 844 salloc: Waiting for resource configuration salloc: Nodes r00n45 are ready for job [traine@r00n45 ~]$ vpkg_require singularity Adding package `singularity/2.5.1` to your environment [traine@r00n45 ~]$ singularity shell tensorflow.simg Singularity: Invoking an interactive shell within container... Singularity tensorflow.simg:~> tensorboard --help usage: tensorboard [-h] [--logdir PATH] [--host ADDR] [--port PORT] [--purge_orphaned_data BOOL] [--reload_interval SECONDS] [--db URI] [--inspect] [--tag TAG] [--event_file PATH] [--path_prefix PATH] [--window_title TEXT] [--max_reload_threads COUNT] [--samples_per_plugin SAMPLES_PER_PLUGIN] [--master_tpu_unsecure_channel ADDR] [--debugger_data_server_grpc_port PORT] [--debugger_port PORT] TensorBoard is a suite of web applications for inspectinng and understanding your TensorFlow runs and graphs. https://github.com/tensorflow/tensorboard Singularity tensorflow.simg:~> grep Cpus_allowed_list /proc/$$/status Cpus_allowed_list: 0-3 Singularity tensorflow.simg:~>exit exit [traine@r00n45 ~]$
Troubleshooting
Locale error
If you experience the error below after starting a particular container, then it is likely due your current locale not being supported in this particular container's shell.
[traine@r00n45 ~]$ singularity shell tensorflow.simg Singularity: Invoking an interactive shell within container... Singularity tensorflow.simg:~> tensorboard --help Traceback (most recent call last): File "/usr/local/bin/tensorboard", line 11, in <module> sys.exit(run_main()) File "/usr/local/lib/python2.7/dist-packages/tensorboard/main.py", line 48, in run_main program.setup_environment() File "/usr/local/lib/python2.7/dist-packages/tensorboard/program.py", line 57, in setup_environment util.setup_logging() File "/usr/local/lib/python2.7/dist-packages/tensorboard/util.py", line 50, in setup_logging locale.setlocale(locale.LC_ALL, '') File "/usr/lib/python2.7/locale.py", line 581, in setlocale return _setlocale(category, locale) locale.Error: unsupported locale setting
Once you start the container shell, check which locale's are supported by using the locale -a
command:
Singularity tensorflow.simg:~> locale -a locale: Cannot set LC_CTYPE to default locale: No such file or directory locale: Cannot set LC_MESSAGES to default locale: No such file or directory locale: Cannot set LC_COLLATE to default locale: No such file or directory C C.UTF-8 POSIX
Of course in this particular case, checking the setting of account traine
it is set to en_US.UTF-8
which is not supported in this shell. To fix the situation, simply set the LANG
locale to one that is supported before starting the container shell.
[tainne@r00n45 ~]$ LANG=C.UTF-8 singularity shell tensorflow.simg Singularity: Invoking an interactive shell within container... Singularity tensorflow.simg:~> tensorboard --help usage: tensorboard [-h] [--logdir PATH] [--host ADDR] [--port PORT] [--purge_orphaned_data BOOL] [--reload_interval SECONDS] [--db URI] [--inspect] [--tag TAG] [--event_file PATH] [--path_prefix PATH] [--window_title TEXT] [--max_reload_threads COUNT] [--samples_per_plugin SAMPLES_PER_PLUGIN] [--master_tpu_unsecure_channel ADDR] [--debugger_data_server_grpc_port PORT] [--debugger_port PORT] TensorBoard is a suite of web applications for inspectinng and understanding your TensorFlow runs and graphs. https://github.com/tensorflow/tensorboard optional arguments: -h, --help show this help message and exit --logdir PATH Directory where TensorBoard will look to find TensorFlow event files that it can display. TensorBoard will recursively walk the directory structure rooted at logdir, looking for .*tfevents.* files. You may also pass a comma separated list of log directories, and TensorBoard will watch each directory. You can also assign names to individual log directories by putting a colon between the name and the path, as in: `tensorboard --logdir=name1:/path/to/logs/1,name2:/path/to/logs/2` --host ADDR What host to listen to. Defaults to serving on all interfaces. Other commonly used values are 127.0.0.1 (localhost) and :: (for IPv6). --port PORT Port to serve TensorBoard on (default: 6006) --purge_orphaned_data BOOL Whether to purge data that may have been orphaned due to TensorBoard restarts. Setting --purge_orphaned_data=False can be used to debug data disappearance. (default: True) --reload_interval SECONDS How often the backend should load more data, in seconds. Set to 0 to load just once at startup and a negative number to never reload at all. (default: 5.0) --db URI [experimental] sets SQL database URI --inspect Prints digests of event files to command line. This is useful when no data is shown on TensorBoard, or the data shown looks weird. Example usage: `tensorboard --inspect --logdir mylogdir --tag loss` See tensorflow/python/summary/event_file_inspector.py for more info. --tag TAG tag to query for; used with --inspect --event_file PATH The particular event file to query for. Only used if --inspect is present and --logdir is not specified. --path_prefix PATH An optional, relative prefix to the path, e.g. "/path/to/tensorboard". resulting in the new base url being located at localhost:6006/path/to/tensorboard under default settings. A leading slash is required when specifying the path_prefix, however trailing slashes can be omitted. The path_prefix can be leveraged for path based routing of an elb when the website base_url is not available e.g. "example.site.com/path/to/tensorboard/". --window_title TEXT changes title of browser window --max_reload_threads COUNT The max number of threads that TensorBoard can use to reload runs. Not relevant for db mode. Each thread reloads one run at a time. --samples_per_plugin SAMPLES_PER_PLUGIN An optional comma separated list of plugin_name=num_samples pairs to explicitly specify how many samples to keep per tag for that plugin. For unspecified plugins, TensorBoard randomly downsamples logged summaries to reasonable values to prevent out- of-memory errors for long running jobs. This flag allows fine control over that downsampling. Note that 0 means keep all samples of that type. For instance "scalars=500,images=0" keeps 500 scalars and all images. Most users should not need to set this flag. profile plugin: --master_tpu_unsecure_channel ADDR IP address of "master tpu", used for getting streaming trace data through tpu profiler analysis grpc. The grpc channel is not secured. debugger plugin: --debugger_data_server_grpc_port PORT The port at which the non-interactive debugger data server should receive debugging data via gRPC from one or more debugger-enabled TensorFlow runtimes. No debugger plugin or debugger data server will be started if this flag is not provided. This flag differs from the `--debugger_port` flag in that it starts a non-interactive mode. It is for use with the "health pills" feature of the Graph Dashboard. This flag is mutually exclusive with `--debugger_port`. --debugger_port PORT The port at which the interactive debugger data server (to be started by the debugger plugin) should receive debugging data via gRPC from one or more debugger- enabled TensorFlow runtimes. No debugger plugin or debugger data server will be started if this flag is not provided. This flag differs from the `--debugger_data_server_grpc_port` flag in that it starts an interactive mode that allows user to pause at selected nodes inside a TensorFlow Graph or between Session.runs. It is for use with the interactive Debugger Dashboard. This flag is mutually exclusive with `--debugger_data_server_grpc_port`. Singularity tensorflow.simg:~>
Common Singularity commands
Some common commands are listed here. For more information about Singularity, see the Singularity web site.
shell | Start a shell within your container using the operating system you have set up your container to use. |
exec | Run a single command within your container. |
run | Run a recipe script you have set up within your container. Using a recipe script forces users of your container to use a pre-established workflow. |
help | Provides help on Singularity. |