software:tensorflow:darwin

This is an old revision of the document!


Tensorflow on DARWIN

TensorFlow is a combination of Python scripted software and compiled libraries and tools. Building TensorFlow from source is extremely involved due to the number of dependencies and additional software packages involved. There are container images of pre-built TensorFlow environments available on DockerHub, and conda packages are available (but tend to lag behind the current release of TensorFlow by significant periods of time).

On DARWIN, only container images are provided to users. Users are welcome to curate their own Python TensorFlow virtual environments. Use of both variants is documented here.

IT RCI maintains TensorFlow Singularity containers for all users of DARWIN:

$ vpkg_versions tensorflow
 
Available versions in package (* = default version):
 
[/opt/shared/valet/2.1/etc/tensorflow.vpkg_yaml]
tensorflow  official TensorFlow containers
  2.3:rocm  TF 2.3 with ROCM 4.2 AMD GPU support
* 2.8:rocm  TF 2.8 with ROCM 5.2.0 AMD GPU support
  2.9:rocm  TF 2.9 with ROCM 5.2.0 AMD GPU support
  2.14.0    TF 2.14.0 official Docker runtime image
  2.15:rocm TF 2.15 with ROCM 6.1 AMD GPU support
  2.16.1    TF 2.16.1 official Docker runtime image

You write your Python code either somewhere in your home directory ($HOME) or somewhere under your workgroup directory ($WORKDIR). You should speak to other group members to understand how you should make use of the workgroup directory, e.g. create a directory for yourself, etc.

Assuming you will use your personal workgroup storage directory ($WORKDIR_USER), create a directory therein for your first TensorFlow job:

$ mkdir -p ${WORKDIR_USER}/tf-test-001
$ cd ${WORKDIR_USER}/tf-test-001

For example, say your TensorFlow Python script is called tf-script.py, then you should copy this file or create it in the tf-test-001 directory, then copy the tensorflow.qs job script template:

$ cp /opt/shared/templates/slurm/applications/tensorflow.qs .

The job script template has extensive documentation that should assist you in customizing it for the job. Last but not least, you need to specify the version of Tensorflow you want via VALET, and then the last line should be changed to match your Python script name and for this example, so for this example it would be tf-script.py:

...
 
#
# Add a TensorFlow container to the environment:
#
vpkg_require tensorflow/2.16.1
 
#
# Execute our TensorFlow Python script:
#
python3 tf-script.py

Finally, submit the job using the sbatch command:

$ sbatch tensorflow.qs

The DARWIN cluster includes nodes with NVIDIA (CUDA-based) GPGPUs and AMD (ROCM-based) GPUs. TensorFlow images with support for these coprocessors are available. Check the vpkg_versions tensorflow listing for versions with the tag rocm and gpu.

  • software/tensorflow/darwin.1716218084.txt.gz
  • Last modified: 2024-05-20 11:14
  • by frey