software:r:caviness

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
Last revisionBoth sides next revision
software:r:caviness [2019-09-03 13:12] anitasoftware:r:caviness [2023-11-28 17:36] – [personal/program specific R libraries and extensions] anita
Line 1: Line 1:
-===== R on Caviness ===== 
- 
-==== Learning R ==== 
- 
-== SWIRL === 
-In addition to other resources, SWIRL is installed on the Caviness cluster and is available as an interactive learning guide 
-inside R: 
- 
-<code> 
-$ vpkg_require r-cran 
-$ R -q --no-save 
-> library(swirl) 
-> swirl() 
-</code> 
- 
- 
- 
-==== R libraries and extensions ==== 
- 
-=== Installed library bundles === 
-The cluster also has the majority of [[http://cran.us.r-project.org/|CRAN]] 
-and [[http://www.bioconductor.org/|Bioconductor]] R libraries already 
-insalled.  These are installed as point-in-time snapshots of their 
-respective catalogs.  These libraries are broken down into different valet 
-packages based on dependencies.  The current bundles are below.  Together 
-these bundles provide access to over 6,600 R modules, pre-compiled and ready 
-for use. 
- 
-^r-cran         |All CRAN modules in CRAN which compile and install cleanly without any additional dependencies.  N.B. all below library packs require this CRAN modle as a base.| 
-^r-cdf          |CRAN modules which need NetCDF, HDF4, HDF5, and UDUNITS libraries. | 
-^r-bioc         |The full suite of[[http://www.bioconductor.org/|Bioconductor]] modules. | 
-^r-fftw         |CRAN modules which need FFTW | 
-^r-geo          |CRAN modules which need GEOS(Geometry Engine, Open Source), GDAL(Geospatial Data Abstraction Library), or PROJ (Cartographic Projections Library)   | 
-^r-gnumath      |CRAN modules which need GSL(GNU Scientific Library), GLPK(GNU Linear Programming Kit), or MPFR(GNU MPFR Library)  | 
-^r-jags         |CRAN modules which need JAGS(Just Another Gibbs Sampler) and the r-gnumath library mentioned above. | 
-^r-graph        |CRAN modules which need Graphviz or GNUplot | 
-^r-mpi          |CRAN modules which need the OpenMPI libraries for parallel computing.  | 
-^r-all          |In addition to loading all the previously mentioned bundles, and CRAN module with multiple dependencies from the above list is also included. | 
-^r-cuda         |CRAN modules which need CUDA/GPUs | 
- 
-=== Loading library bundles for use === 
-<code> 
-$ vpkg_require r-geo 
-Adding dependency `r-bioc/3.5.1:20180715` to your environment 
-Adding dependency `gsl/1.16` to your environment 
-Adding dependency `gmp/6.1.2` to your environment 
-Adding dependency `glpk/4.65` to your environment 
-Adding dependency `mpfr/4.0.1` to your environment 
-Adding dependency `r-gnumath/3.5.1:20180715` to your environment 
-Adding dependency `fftw/3.3.8` to your environment 
-Adding dependency `r-fftw/3.5.1:20180715` to your environment 
-Adding dependency `szip/2.1.1` to your environment 
-Adding dependency `hdf4/4.2.13` to your environment 
-Adding dependency `hdf5/1.10.2` to your environment 
-Adding dependency `netcdf/4.6.1` to your environment 
-Adding dependency `udunits/2.2.26` to your environment 
-Adding dependency `r-cdf/3.5.1:20180715` to your environment 
-Adding dependency `geos/3.6.2` to your environment 
-Adding dependency `gdal/2.3.0` to your environment 
-Adding dependency `proj/5.1.0` to your environment 
-Adding package `r-geo/3.5.1:20180715` to your environment 
-$ 
-</code> 
- 
-Now using the library in R can be done as normal. 
- 
-<code> 
-$ R --no-save -q 
-> library(CopulaRegression) 
-Loading required package: MASS 
-Loading required package: VineCopula 
-> 
-</code> 
- 
-=== Learning about modules === 
-IT provides a small script called ''r-info'' which will display the internal 
-documentation of R modules.  This is helpful to get basic information on 
-a module to decide if it requires more research.  To use this tool, the library 
-must be installed, and the module bundle must be loaded with ''vpkg_require''. 
-For example: 
- 
-<code> 
-$ vpkg_require r-cran 
-$ r-info car 
-Loading required package: carData 
- 
-                Information on package ‘car’ 
- 
-Description: 
- 
-Package:            car 
-Version:            3.0-0 
-Date:               2018-03-23 
-Title:              Companion to Applied Regression 
- 
-... 
- 
-Further information is available in the following vignettes in 
-directory ‘/opt/shared/r/add-ons/r3.5.1/cran/20180715/car/doc’: 
- 
-embedding: Using car functions inside user functions (source, pdf)      
-$ 
-</code> 
- 
-==== personal/program specific R libraries and extensions ==== 
-You can create your own library of R modules which contains different 
-versions than provided through VALET, or modules not available via VALET. 
- 
-R looks in an environment variable called 'R_LIBS' to obtain a list of 
-locations to search for modules.  You should ensure your entry is first 
-in the list, this will allow your library to override any conflicts which 
-may be installed on the system.  This is also important, because R installs 
-modules into the first entry in this list by default. 
- 
-=== Simple example === 
-Once this is done, you can use the install using ''install.packages'' Make sure you are in your workgroup (e.g. ''workgroup -g <<//investing-entity//>>''. Here 
-is an example: 
- 
-<code> 
-$ workgroup -g it_css 
-$ vpkg_require r-cran 
-Adding dependency `r/3.5.1` to your environment 
-Adding package `r-cran/3.5.1:20180715` to your environment 
-$ mkdir -p $WORKDIR/sw/r/add-ons/r3.5.1/testing/default 
-$ echo $R_LIBS 
-/opt/shared/r/add-ons/r3.5.1/cran/20180715 
-$ R_LIBS="$WORKDIR/sw/r/add-ons/r3.5.1/testing/default:$R_LIBS" 
-$ R -q --no-save 
-> .libPaths() 
-[1] "/work/it_css/sw/r/add-ons/r3.5.1/testing/default" 
-[2] "/opt/shared/r/add-ons/r3.5.1/cran/20180715" 
-[3] "/opt/shared/r/3.5.1/lib64/R/library" 
-> chooseCRANmirror(all) 
-Secure CRAN mirrors 
- 
- 1: 0-Cloud [https]                   2: Algeria [https] 
- 3: Australia (Canberra) [https]      4: Australia (Melbourne 1) [https] 
- 5: Australia (Melbourne 2) [https]   6: Australia (Perth) [https] 
- 7: Austria [https]                   8: Belgium (Ghent) [https] 
- 9: Brazil (PR) [https]              10: Brazil (RJ) [https] 
-11: Brazil (SP 1) [https]            12: Brazil (SP 2) [https] 
-13: Bulgaria [https]                 14: Chile [https] 
-15: China (Hong Kong) [https]        16: China (Lanzhou) [https] 
-17: China (Shanghai) [https]         18: Colombia (Cali) [https] 
-19: Czech Republic [https]           20: Denmark [https] 
-21: Ecuador (Cuenca) [https]         22: Ecuador (Quito) [https] 
-23: Estonia [https]                  24: France (Lyon 2) [https] 
-25: France (Marseille) [https]       26: France (Montpellier) [https] 
-27: Germany (Erlangen) [https]       28: Germany (Göttingen) [https] 
-29: Germany (Münster) [https]        30: Germany (Regensburg) [https] 
-31: Greece [https]                   32: Hungary [https] 
-33: Iceland [https]                  34: Indonesia (Jakarta) [https] 
-35: Italy (Padua) [https]            36: Japan (Tokyo) [https] 
-37: Japan (Yonezawa) [https]         38: Korea (Busan) [https] 
-39: Korea (Gyeongsan-si) [https]     40: Korea (Seoul 1) [https] 
-41: Korea (Ulsan) [https]            42: Malaysia [https] 
-43: Mexico (Mexico City) [https]     44: Norway [https] 
-45: Philippines [https]              46: Serbia [https] 
-47: Spain (Madrid) [https]           48: Sweden [https] 
-49: Switzerland [https]              50: Turkey (Denizli) [https] 
-51: Turkey (Mersin) [https]          52: UK (Bristol) [https] 
-53: UK (London 1) [https]            54: USA (CA 1) [https] 
-55: USA (IA) [https]                 56: USA (KS) [https] 
-57: USA (MI 1) [https]               58: USA (MI 2) [https] 
-59: USA (OR) [https]                 60: USA (TN) [https] 
-61: USA (TX 1) [https]               62: Uruguay [https] 
-63: (other mirrors) 
- 
-Selection: 55 
-> install.packages("KernSmooth", dependencies=TRUE) 
-Installing package into ‘/work/it_css/sw/r/add-ons/r3.5.1/testing/default’ 
-(as ‘lib’ is unspecified) 
- 
-trying URL 'https://ftp.osuosl.org/pub/cran/src/contrib/KernSmooth_2.23-15.tar.gz' 
-Content type 'application/x-gzip' length 24572 bytes (23 KB) 
-================================================== 
-downloaded 23 KB 
- 
-* installing *source* package ‘KernSmooth’ ... 
-** package ‘KernSmooth’ successfully unpacked and MD5 sums checked 
-** libs 
-gfortran   -fpic  -g -O2  -c blkest.f -o blkest.o 
-gfortran   -fpic  -g -O2  -c cp.f -o cp.o 
-gfortran   -fpic  -g -O2  -c dgedi.f -o dgedi.o 
-gfortran   -fpic  -g -O2  -c dgefa.f -o dgefa.o 
-gfortran   -fpic  -g -O2  -c dgesl.f -o dgesl.o 
-gcc -std=gnu99 -I"/opt/shared/r/3.5.1/lib64/R/include" -DNDEBUG   -I/opt/shared/gcc/4.9.4/include   -fpic  -g -O2  -c init.c -o init.o 
-gfortran   -fpic  -g -O2  -c linbin.f -o linbin.o 
-gfortran   -fpic  -g -O2  -c linbin2D.f -o linbin2D.o 
-gfortran   -fpic  -g -O2  -c locpoly.f -o locpoly.o 
-gfortran   -fpic  -g -O2  -c rlbin.f -o rlbin.o 
-gfortran   -fpic  -g -O2  -c sdiag.f -o sdiag.o 
-gfortran   -fpic  -g -O2  -c sstdiag.f -o sstdiag.o 
-gcc -std=gnu99 -shared -L/opt/shared/r/3.5.1/lib64/R/lib -L/opt/shared/gcc/4.9.4/lib -L/opt/shared/gcc/4.9.4/lib64 -o KernSmooth.so blkest.o cp.o dgedi.o dgefa.o dgesl.o init.o linbin.o linbin2D.o locpoly.o rlbin.o sdiag.o sstdiag.o -L/opt/shared/r/3.5.1/lib64/R/lib/atlas -lRblas -lgfortran -lm -lquadmath -lgfortran -lm -lquadmath -L/opt/shared/r/3.5.1/lib64/R/lib -lR 
-installing to /work/it_css/sw/r/add-ons/r3.5.1/testing/default/KernSmooth/libs 
-** R 
-** inst 
-** byte-compile and prepare package for lazy loading 
-** help 
-*** installing help indices 
-** building package indices 
-** testing if installed package can be loaded 
-* DONE (KernSmooth) 
- 
-The downloaded source packages are in 
-        ‘/tmp/RtmpVq5oBb/downloaded_packages’ 
-> library(KernSmooth) 
-KernSmooth 2.23 loaded 
-Copyright M. P. Wand 1997-2009 
-> 
-</code> 
- 
-Notice that the output of ''.libPaths()'' specifies my personal library 
-directory first?   
- 
- 
-=== Using IT's udbuild environment === 
-IT developed a formalization for installing modules called [[abstract:caviness:install_software:install_software|udbuild]] 
-which can simplify the installation of modules.  Here is an example ''udbuild'' 
-script which can be used to install a personal R library. 
- 
-<file sh udbuild-testing-cuda> 
-#!/bin/bash -l 
- 
-PKGNAME=testing 
-VERSION=default 
- 
-UDBUILD_HOME=$WORKDIR/sw 
-PKG_LIST=' 
- WideLM rpud permGPU magma gputools cudaBayesregData cudaBayesreg 
- CARramps 
-' 
- 
-vpkg_devrequire udbuild r/3.1.1 r-cran/20140905 
-init_udbuildenv r-addon cuda/6.5 
- 
-#Sometimes R doesn't properly use CPPFLAGS which is set by VALET, fix that here: 
-CPATH=$CUDA_PREFIX/include:$CPATH 
-LIBRARY_PATH=$CUDA_PREFIX/lib64:$CUDA_PREFIX/lib64/stubs:$LIBRARY_PATH 
- 
-#CRAN_MIRROR='http://cran.cs.wwu.edu/' 
-CRAN_MIRROR='http://lib.stat.cmu.edu/R/CRAN/' 
- 
-quote() { printf '"%s", ' "$@" | sed 's/, $/\n/'; } 
- 
-R -q --no-save <<EOT 
- .libPaths() 
- options(repos=structure(c(CRAN="$CRAN_MIRROR"))) 
- for ( pkg in c( `quote $PKG_LIST` ) ) { 
- print(pkg) 
- install.packages(pkg, dependencies=TRUE) 
- } 
- 
- warnings() 
-EOT 
-</file> 
- 
-This script will attempt to build the cuda capable R modules using the 
-cuda 6.5 version into ''$WORKDIR/sw/r/add-ons/r3.1.1/testing/default-cuda-6.5''. 
- 
-====== R script in batch ====== 
- 
-==== matmul.R script ==== 
- 
-Consider the simple R script file to multiply a small 3x3 matrix 
- 
-<file R matmul.R> 
-# Calculate and print small matrix AA' 
-a <- matrix(1:12,3,4); 
-a%*%t(a) 
-</file> 
- 
-Let's test this R script using ''Rscript'' from the command line on a compute node.  Don't forget to set your [[abstract:caviness:app_dev:compute_env#using-workgroup-and-directories|workgroup]] to define your cluster group or //investing-entity// compute nodes before you use ''salloc'' to get on a compute node. For example, 
- 
-<code bash> 
-workgroup -g it_css 
-salloc 
-vpkg_require r/3.5 
-Rscript matmul.R 
-</code> 
- 
-The output to the screen: 
- 
-<code> 
-     [,1] [,2] [,3] 
-[1,]  166  188  210 
-[2,]  188  214  240 
-[3,]  210  240  270 
-</code> 
- 
-To return to the head node, type 
-<code bash> 
-exit 
-</code> 
- 
-==== matmul.qs file ==== 
- 
-To run a R script in batch instead of on the command line has nearly the same steps. Copy a template job submission script (''/opt/templates/slurm/generic/threads.qs'') for example and called it ''matmul.qs'' Now edit it to change the job name and add your commands for your job something like this: 
- 
-<code bash> 
-#!/bin/bash -l 
-# 
-.... 
- 
-#SBATCH --job-name=matmultiply_R 
- 
-... 
-# 
-# [EDIT] Execute your OpenMP/threaded program using the srun command: 
-# 
-# Add vpkg_require commands 
-vpkg_require r/3.5 
- 
-# Syntax: Rscript [options] filename.R [arguments] 
-Rscript matmul.R  
-</file> 
- 
-Now to run the R script simply submit the job from the head node with the 
-''sbatch'' command. 
- 
-<code> 
-sbatch matmul.qs 
-</code> 
- 
-You should see a notification that your job was submitted.  Something like this 
- 
-<code bash> 
-Submitted batch job 983119 
-</code> 
- 
-After the code completes the output of the script will appear in the file 
-''slurm-983119.out'' because the job number is 983119. Type  
- 
-<code> 
-more slurm-983119.out 
-</code> 
- 
-to display the contents of the output file on the screen.  For example, 
- 
-<code> 
--- OpenMP job setup complete: 
---  OMP_NUM_THREADS      = 2 
---  OMP_PROC_BIND        = true 
---  OMP_PLACES           = cores 
---  MP_BLIST             = 5,17 
- 
-Adding package `r/3.5.1` to your environment 
-     [,1] [,2] [,3] 
-[1,]  166  188  210 
-[2,]  188  214  240 
-[3,]  210  240  270 
-</code> 
- 
-====== Using R script in batch array job ====== 
-===== sweep.R file ===== 
- 
-Consider the simple script to print a fraction from the argument list 
- 
-<file R sweep.R> 
-args <- commandArgs(trailingOnly = TRUE) 
-# print fraction from argument list  
-as.numeric(args[1])/as.numeric(args[2]) 
-</file> 
- 
-This is a R script which can be run from the command line on a compute node the commands  
- 
-<code bash> 
-salloc 
-vpkg_require r/3.5 
-Rscript sweep.R 5 200 
-</code> 
- 
-The output to the screen: 
-<code> 
-[1] 0.025 
-</code> 
- 
-===== sweep.qs file ===== 
- 
-Again copy a template job submission script (/opt/templates/slurm/generic/threads.qs) for example and call it ''sweep.qs''. Now edit it to change the job name, this time adding options for an array job and add your commands for your job something like this: 
- 
-<file bash sweep.qs> 
-#!/bin/bash -l 
-# 
-.... 
- 
-#SBATCH --job-name=matmultiply_R 
-#SBATCH --array=1-200 
- 
-... 
-# 
-# [EDIT] Execute your OpenMP/threaded program using the srun command: 
-# 
-## Parameter sweep array job to run the sweep.R  with 
-##    lambda = 0,1,2. ... 199 
-## 
-# Add vpkg_require commands 
-vpkg_require r/3.5 
- 
-date "+Start %s" 
-echo "Host $HOSTNAME" 
- 
-let lambda="$SLURM_ARRAY_TASK_ID-1" 
-let taskCount=200 
- 
-# Syntax: Rscript [options] filename.R [arguments] 
-Rscript --vanilla sweep.R $lambda $taskCount 
- 
-date "+Finish %s" 
-</file> 
- 
-The ''date'' and ''echo Host'' lines are just a way of keeping track of when and where the jobs are run. 
-There will be 200 array jobs all running the same script with different parameters (arguments).  The ''--vanilla'' option 
-is used to prevent the multiple jobs from using the same disk space. 
- 
-To run this in batch you must submit the job from the head node with the 
-''sbatch'' command. 
- 
-<code> 
-sbatch sweep.qs 
-</code> 
- 
-After the code completes the output of the script will appear in the files 
-''sweep.o535064.1'' to ''sweep.o535064.200''. The number 535064 is the job ID assigned to your job when submitted, and 1 to 200 is the Task ID (e.g. corresponds to the ''-t 1-200'') 
- 
-<code> 
-Adding dependency `x11/RHEL6.1` to your environment 
-Adding package `r/3.0.2` to your environment 
-[1] 0.025 
-</code> 
-<note tip> 
-You will want to do more than just print out one fraction in your script.  The integer parameter can be used for 
-a one dimensional parameter sweep, to construct unique input and output file names for each task,  
-or as a seed for the R Random Number Generator (RNG).</note> 
- 
-==== Writing files from an array job ==== 
- 
-You are running many jobs in the same directory.  Grid engine handles the standard output by writing to 
-separate files with "dot taskid" appended to the jobid.  You need to take care of other file output in your R script. 
- 
-<note important> 
-You need to make sure no two of your jobs will write to the same file.  Look at your R script to see if you 
-are writing files.  Look for the ''**sink**'' command or any graphics writing commands such as ''**pdf**'' or ''**png**''. 
-If you are using these R functions, then use a unique file name constructed from the task id. 
-</note> 
- 
-==== vanilla option ==== 
- 
-The command-line option ''--vanilla'' implies --no-site-file, --no-init-file and --no-environ.  This way you will not 
-be reading or writing to the same files.  If you need initialization command, put them in your R script instead of in 
-in the init-file ''.Rprofile'' If you need some environment variables, export them in your bash script instead of assigning 
-them in your environ file ''.Renviron''. 
  
  • software/r/caviness.txt
  • Last modified: 2023-11-28 17:37
  • by anita