software:tensorflow:darwin

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

software:tensorflow:darwin [2024-05-20 11:14] – created freysoftware:tensorflow:darwin [2024-05-20 11:51] (current) frey
Line 43: Line 43:
  
 <code bash> <code bash>
-...+   :
  
 # #
Line 66: Line 66:
 The DARWIN cluster includes nodes with NVIDIA (CUDA-based) GPGPUs and AMD (ROCM-based) GPUs.  TensorFlow images with support for these coprocessors are available.  Check the ''vpkg_versions tensorflow'' listing for versions with the tag ''rocm'' and ''gpu''. The DARWIN cluster includes nodes with NVIDIA (CUDA-based) GPGPUs and AMD (ROCM-based) GPUs.  TensorFlow images with support for these coprocessors are available.  Check the ''vpkg_versions tensorflow'' listing for versions with the tag ''rocm'' and ''gpu''.
  
 +===== Virtual environments =====
 +
 +As of 2024, Anaconda virtual environments are suggested for TensorFlow virtual environments.  This recipe assumes the user is adding the software to shared workgroup storage, ''${WORKDIR_SW}/tensorflow'' and ''${WORKDIR_SW}/valet''.
 +
 +Start by adding the Anaconda distribution base to the environment (here ''2024.02:python3'' is used, but you should always check for newer versions with ''vpkg_versions''):
 +
 +<code bash>
 +[(my_workgroup:user)@login01.darwin ~]$ vpkg_require anaconda/2024.02:python3
 +Adding package `anaconda/2024.02:python3` to your environment
 +[(my_workgroup:user)@login01.darwin ~]$
 +</code>
 +
 +The ''conda search tensorflow'' command can be used to locate the specific version you wish to install.  Two examples are shown:
 +
 +<code bash>
 +[(my_workgroup:user)@login01.darwin ~]$ conda search tensorflow
 +Loading channels: done
 +# Name                       Version           Build  Channel             
 +tensorflow                     1.4.1                pkgs/main           
 +tensorflow                     1.5.0                pkgs/main           
 +   :
 +tensorflow                    2.11.0 eigen_py310h0f08fec_0  pkgs/main           
 +   :        
 +tensorflow                    2.12.0 gpu_py38h03d86b3_0  pkgs/main
 +   :
 +tensorflow                    2.12.0 mkl_py39h5ea9445_0  pkgs/main      
 +</code>
 +
 +Note that the build tag provides the distinction between variants built on top of specific devices or libraries.  For example, the final item above is built atop the Intel MKL infrastructure and translates to the qualified conda package name ''tensorflow[version=2.12.0,build= mkl_py39h5ea9445_0]''.
 +
 +All versions of the TensorFlow virtualenv will be stored in the common base directory, ''${WORKDIR_SW}/tensorflow''; each virtualenv must have a unique name that will become the VALET version.  In this tutorial, the latest version of TensorFlow with MKL support will be installed using the tag ''mkl'' on the version:
 +
 +<code bash>
 +[(my_workgroup:user)@login01 ~]$ vpkg_id2path --version-id=2.12.0:mkl
 +2.12.0-mkl
 +</code>
 +
 +The virtualenv is created using the ''%%--%%prefix'' option to direct the installation to the desired directory:
 +
 +<code bash>
 +[(my_workgroup:user)@login01 ~]$ conda create --prefix=${WORKDIR_SW}/tensorflow/2.12.0-mkl 'tensorflow[version=2.12.0,build=mkl_py39h5ea9445_0]'
 +   :
 +Preparing transaction: done                                                                                             
 +Verifying transaction: done                                                                                             
 +Executing transaction: done                                                                                             
 +#                                                                                                                       
 +# To activate this environment, use                                                                                     
 +#                                                                                                                       
 +#     $ conda activate /lustre/my_workgroup/sw/tensorflow/2.12.0-mkl                                                          
 +#                                                                                                                       
 +# To deactivate an active environment, use                                                                              
 +#                                                                                                                       
 +#     $ conda deactivate
 +
 +</code>
 +
 +==== VALET package definition ====
 +
 +Assuming the workgroup does //not// already have a TensorFlow VALET package definition, the following YAML config can be modified (e.g. alter the ''prefix'' path) and added to the file ''${WORKDIR_SW}/valet/tensorflow.vpkg_yaml'':
 +
 +<code yaml>
 +tensorflow:
 +    prefix: /lustre/my_workgroup/sw/tensorflow
 +    description: TensorFlow Python environments
 +    url: "https://www.tensorflow.org"
 +    
 +    flags:
 +        - no-standard-paths
 +
 +    versions:
 +        "2.12.0:mkl":
 +            description: 2.12.0, mkl_py39h5ea9445_0 build
 +            dependencies:
 +                - anaconda/2024.02:python3
 +            actions:
 +                - action: source
 +                  script:
 +                      sh: anaconda-activate-2024.sh
 +                  success: 0 
 +</code>
 +
 +If the ''${WORKDIR_SW}/valet/tensorflow.vpkg_yaml'' file already exists, add the new version at the same level as others (under the ''versions'' key):
 +
 +<code>
 +               :
 +        "2.12.0:mkl":
 +            description: 2.12.0, mkl_py39h5ea9445_0 build
 +            dependencies:
 +                - anaconda/2024.02:python3
 +            actions:
 +                - action: source
 +                  script:
 +                      sh: anaconda-activate-2024.sh
 +                  success: 0
 +                  
 +        "2.12.0:gpu":
 +            description: 2.12.0, gpu_py311h65739b5_0 build
 +               :
 +</code>
 +
 +With a properly-constructed package definition file, you can now check for your versions of TensorFlow:
 +
 +<code bash>
 +[(it_nss:frey)@login00 ~]$ vpkg_versions tensorflow
 +
 +Available versions in package (* = default version):                                                                    
 +                                                                                                                        
 +[/lustre/my_workgroup/sw/valet/tensorflow.vpkg_yaml]
 +tensorflow    
 +* 2.12.0:mkl  2.12.0, mkl_py39h5ea9445_0 build
 +    :
 +</code>
 +
 +==== Job scripts ====
 +
 +Any job scripts designed to run scripts using this virtualenv should include something like the following toward its end:
 +
 +<code>
 +   :
 +   
 +#
 +# Setup TensorFlow virtualenv:
 +#
 +vpkg_require tensorflow/2.12.0:mkl
 +
 +#
 +# Run a Python script in that virtualenv:
 +#
 +python3 my_tf_work.py
 +rc=$?
 +
 +#
 +# Do cleanup work, etc....
 +#
 +
 +#
 +# Exit with whatever exit code our Python script handed back:
 +#
 +exit $rc
 +</code>
  
  • software/tensorflow/darwin.1716218084.txt.gz
  • Last modified: 2024-05-20 11:14
  • by frey