Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revisionLast revisionBoth sides next revision | ||
software:r:caviness [2021-03-17 14:27] – [matmul.qs file] anita | software:r:caviness [2023-11-28 17:36] – [personal/program specific R libraries and extensions] anita | ||
---|---|---|---|
Line 1: | Line 1: | ||
- | ===== R on Caviness ===== | ||
- | |||
- | ==== Learning R ==== | ||
- | |||
- | == SWIRL === | ||
- | In addition to other resources, SWIRL is installed on the Caviness cluster and is available as an interactive learning guide | ||
- | inside R: | ||
- | |||
- | < | ||
- | $ vpkg_require r-cran | ||
- | $ R -q --no-save | ||
- | > library(swirl) | ||
- | > swirl() | ||
- | </ | ||
- | |||
- | |||
- | |||
- | ==== R libraries and extensions ==== | ||
- | |||
- | === Installed library bundles === | ||
- | The cluster also has the majority of [[http:// | ||
- | and [[http:// | ||
- | insalled. | ||
- | respective catalogs. | ||
- | packages based on dependencies. | ||
- | these bundles provide access to over 6,600 R modules, pre-compiled and ready | ||
- | for use. | ||
- | |||
- | ^r-cran | ||
- | ^r-cdf | ||
- | ^r-bioc | ||
- | ^r-fftw | ||
- | ^r-geo | ||
- | ^r-gnumath | ||
- | ^r-jags | ||
- | ^r-graph | ||
- | ^r-mpi | ||
- | ^r-all | ||
- | ^r-cuda | ||
- | |||
- | === Loading library bundles for use === | ||
- | < | ||
- | $ vpkg_require r-geo | ||
- | Adding dependency `r-bioc/ | ||
- | Adding dependency `gsl/1.16` to your environment | ||
- | Adding dependency `gmp/6.1.2` to your environment | ||
- | Adding dependency `glpk/4.65` to your environment | ||
- | Adding dependency `mpfr/ | ||
- | Adding dependency `r-gnumath/ | ||
- | Adding dependency `fftw/ | ||
- | Adding dependency `r-fftw/ | ||
- | Adding dependency `szip/ | ||
- | Adding dependency `hdf4/ | ||
- | Adding dependency `hdf5/ | ||
- | Adding dependency `netcdf/ | ||
- | Adding dependency `udunits/ | ||
- | Adding dependency `r-cdf/ | ||
- | Adding dependency `geos/ | ||
- | Adding dependency `gdal/ | ||
- | Adding dependency `proj/ | ||
- | Adding package `r-geo/ | ||
- | $ | ||
- | </ | ||
- | |||
- | Now using the library in R can be done as normal. | ||
- | |||
- | < | ||
- | $ R --no-save -q | ||
- | > library(CopulaRegression) | ||
- | Loading required package: MASS | ||
- | Loading required package: VineCopula | ||
- | > | ||
- | </ | ||
- | |||
- | === Learning about modules === | ||
- | IT provides a small script called '' | ||
- | documentation of R modules. | ||
- | a module to decide if it requires more research. | ||
- | must be installed, and the module bundle must be loaded with '' | ||
- | For example: | ||
- | |||
- | < | ||
- | $ vpkg_require r-cran | ||
- | $ r-info car | ||
- | Loading required package: carData | ||
- | |||
- | Information on package ‘car’ | ||
- | |||
- | Description: | ||
- | |||
- | Package: | ||
- | Version: | ||
- | Date: | ||
- | Title: | ||
- | |||
- | ... | ||
- | |||
- | Further information is available in the following vignettes in | ||
- | directory ‘/ | ||
- | |||
- | embedding: Using car functions inside user functions (source, pdf) | ||
- | $ | ||
- | </ | ||
- | |||
- | ==== personal/ | ||
- | You can create your own library of R modules which contains different | ||
- | versions than provided through VALET, or modules not available via VALET. | ||
- | |||
- | R looks in an environment variable called ' | ||
- | locations to search for modules. | ||
- | in the list, this will allow your library to override any conflicts which | ||
- | may be installed on the system. | ||
- | modules into the first entry in this list by default. | ||
- | |||
- | === Simple example === | ||
- | Once this is done, you can install by using '' | ||
- | is an example: | ||
- | |||
- | < | ||
- | $ workgroup -g it_css | ||
- | $ vpkg_require r-cran | ||
- | Adding dependency `r/3.5.1` to your environment | ||
- | Adding package `r-cran/ | ||
- | $ mkdir -p $WORKDIR/ | ||
- | $ echo $R_LIBS | ||
- | / | ||
- | $ R_LIBS=" | ||
- | $ R -q --no-save | ||
- | > .libPaths() | ||
- | [1] "/ | ||
- | [2] "/ | ||
- | [3] "/ | ||
- | > chooseCRANmirror(all) | ||
- | Secure CRAN mirrors | ||
- | |||
- | 1: 0-Cloud [https] | ||
- | 3: Australia (Canberra) [https] | ||
- | 5: Australia (Melbourne 2) [https] | ||
- | 7: Austria [https] | ||
- | 9: Brazil (PR) [https] | ||
- | 11: Brazil (SP 1) [https] | ||
- | 13: Bulgaria [https] | ||
- | 15: China (Hong Kong) [https] | ||
- | 17: China (Shanghai) [https] | ||
- | 19: Czech Republic [https] | ||
- | 21: Ecuador (Cuenca) [https] | ||
- | 23: Estonia [https] | ||
- | 25: France (Marseille) [https] | ||
- | 27: Germany (Erlangen) [https] | ||
- | 29: Germany (Münster) [https] | ||
- | 31: Greece [https] | ||
- | 33: Iceland [https] | ||
- | 35: Italy (Padua) [https] | ||
- | 37: Japan (Yonezawa) [https] | ||
- | 39: Korea (Gyeongsan-si) [https] | ||
- | 41: Korea (Ulsan) [https] | ||
- | 43: Mexico (Mexico City) [https] | ||
- | 45: Philippines [https] | ||
- | 47: Spain (Madrid) [https] | ||
- | 49: Switzerland [https] | ||
- | 51: Turkey (Mersin) [https] | ||
- | 53: UK (London 1) [https] | ||
- | 55: USA (IA) [https] | ||
- | 57: USA (MI 1) [https] | ||
- | 59: USA (OR) [https] | ||
- | 61: USA (TX 1) [https] | ||
- | 63: (other mirrors) | ||
- | |||
- | Selection: 55 | ||
- | > install.packages(" | ||
- | Installing package into ‘/ | ||
- | (as ‘lib’ is unspecified) | ||
- | |||
- | trying URL ' | ||
- | Content type ' | ||
- | ================================================== | ||
- | downloaded 23 KB | ||
- | |||
- | * installing *source* package ‘KernSmooth’ ... | ||
- | ** package ‘KernSmooth’ successfully unpacked and MD5 sums checked | ||
- | ** libs | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gcc -std=gnu99 -I"/ | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gfortran | ||
- | gcc -std=gnu99 -shared -L/ | ||
- | installing to / | ||
- | ** R | ||
- | ** inst | ||
- | ** byte-compile and prepare package for lazy loading | ||
- | ** help | ||
- | *** installing help indices | ||
- | ** building package indices | ||
- | ** testing if installed package can be loaded | ||
- | * DONE (KernSmooth) | ||
- | |||
- | The downloaded source packages are in | ||
- | ‘/ | ||
- | > library(KernSmooth) | ||
- | KernSmooth 2.23 loaded | ||
- | Copyright M. P. Wand 1997-2009 | ||
- | > | ||
- | </ | ||
- | |||
- | Notice that the output of '' | ||
- | directory first? | ||
- | |||
- | |||
- | === Using IT's udbuild environment === | ||
- | IT developed a formalization for installing modules called [[abstract: | ||
- | which can simplify the installation of modules. | ||
- | script which can be used to install a personal R library. | ||
- | |||
- | <file sh udbuild-testing-cuda> | ||
- | #!/bin/bash -l | ||
- | |||
- | PKGNAME=testing | ||
- | VERSION=default | ||
- | |||
- | UDBUILD_HOME=$WORKDIR/ | ||
- | PKG_LIST=' | ||
- | WideLM rpud permGPU magma gputools cudaBayesregData cudaBayesreg | ||
- | CARramps | ||
- | ' | ||
- | |||
- | vpkg_devrequire udbuild r/3.1.1 r-cran/ | ||
- | init_udbuildenv r-addon cuda/6.5 | ||
- | |||
- | #Sometimes R doesn' | ||
- | CPATH=$CUDA_PREFIX/ | ||
- | LIBRARY_PATH=$CUDA_PREFIX/ | ||
- | |||
- | # | ||
- | CRAN_MIRROR=' | ||
- | |||
- | quote() { printf '" | ||
- | |||
- | R -q --no-save <<EOT | ||
- | .libPaths() | ||
- | options(repos=structure(c(CRAN=" | ||
- | for ( pkg in c( `quote $PKG_LIST` ) ) { | ||
- | print(pkg) | ||
- | install.packages(pkg, | ||
- | } | ||
- | |||
- | warnings() | ||
- | EOT | ||
- | </ | ||
- | |||
- | This script will attempt to build the cuda capable R modules using the | ||
- | cuda 6.5 version into '' | ||
- | |||
- | ====== R script in batch ====== | ||
- | |||
- | ==== matmul.R script ==== | ||
- | |||
- | Consider the simple R script file to multiply a small 3x3 matrix | ||
- | |||
- | <file R matmul.R> | ||
- | # Calculate and print small matrix AA' | ||
- | a <- matrix(1: | ||
- | a%*%t(a) | ||
- | </ | ||
- | |||
- | Let's test this R script using '' | ||
- | |||
- | <code bash> | ||
- | workgroup -g it_css | ||
- | salloc | ||
- | vpkg_require r/3.5 | ||
- | Rscript matmul.R | ||
- | </ | ||
- | |||
- | The output to the screen: | ||
- | |||
- | < | ||
- | [,1] [,2] [,3] | ||
- | [1,] 166 188 210 | ||
- | [2,] 188 214 240 | ||
- | [3,] 210 240 270 | ||
- | </ | ||
- | |||
- | To return to the head node, type | ||
- | <code bash> | ||
- | exit | ||
- | </ | ||
- | |||
- | ==== matmul.qs file ==== | ||
- | |||
- | To run a R script in batch instead of on the command line has nearly the same steps. Copy a template job submission script (''/ | ||
- | |||
- | <file bash matmul.qs> | ||
- | #!/bin/bash -l | ||
- | # | ||
- | .... | ||
- | |||
- | #SBATCH --job-name=matmultiply_R | ||
- | |||
- | ... | ||
- | # | ||
- | # [EDIT] Execute your OpenMP/ | ||
- | # | ||
- | # Add vpkg_require commands | ||
- | vpkg_require r/3.5 | ||
- | |||
- | # Syntax: Rscript [options] filename.R [arguments] | ||
- | Rscript matmul.R | ||
- | </ | ||
- | |||
- | Now to run the R script simply submit the job from the head node with the | ||
- | '' | ||
- | |||
- | < | ||
- | sbatch matmul.qs | ||
- | </ | ||
- | |||
- | You should see a notification that your job was submitted. | ||
- | |||
- | <code bash> | ||
- | Submitted batch job 983119 | ||
- | </ | ||
- | |||
- | After the code completes the output of the script will appear in the file | ||
- | '' | ||
- | |||
- | < | ||
- | more slurm-983119.out | ||
- | </ | ||
- | |||
- | to display the contents of the output file on the screen. | ||
- | |||
- | < | ||
- | -- OpenMP job setup complete: | ||
- | -- OMP_THREAD_LIMIT | ||
- | -- OMP_PROC_BIND | ||
- | -- OMP_PLACES | ||
- | -- MP_BLIST | ||
- | |||
- | Adding package `r/3.5.1` to your environment | ||
- | [,1] [,2] [,3] | ||
- | [1,] 166 188 210 | ||
- | [2,] 188 214 240 | ||
- | [3,] 210 240 270 | ||
- | </ | ||
- | |||
- | ====== Using R script in batch array job ====== | ||
- | ===== sweep.R file ===== | ||
- | |||
- | Consider the simple script to print a fraction from the argument list | ||
- | |||
- | <file R sweep.R> | ||
- | args <- commandArgs(trailingOnly = TRUE) | ||
- | # print fraction from argument list | ||
- | as.numeric(args[1])/ | ||
- | </ | ||
- | |||
- | This is a R script which can be run from the command line on a compute node the commands | ||
- | |||
- | <code bash> | ||
- | salloc | ||
- | vpkg_require r/3.5 | ||
- | Rscript sweep.R 5 200 | ||
- | </ | ||
- | |||
- | The output to the screen: | ||
- | < | ||
- | [1] 0.025 | ||
- | </ | ||
- | |||
- | ===== sweep.qs file ===== | ||
- | |||
- | Again copy a template job submission script (/ | ||
- | |||
- | <file bash sweep.qs> | ||
- | #!/bin/bash -l | ||
- | # | ||
- | .... | ||
- | |||
- | #SBATCH --job-name=sweep_R | ||
- | #SBATCH --array=1-200 | ||
- | |||
- | ... | ||
- | # | ||
- | # [EDIT] Execute your OpenMP/ | ||
- | # | ||
- | ## Parameter sweep array job to run the sweep.R | ||
- | ## lambda = 0,1,2. ... 199 | ||
- | ## | ||
- | # Add vpkg_require commands | ||
- | vpkg_require r/3.5 | ||
- | |||
- | date " | ||
- | echo "Host $HOSTNAME" | ||
- | |||
- | let lambda=" | ||
- | let taskCount=200 | ||
- | |||
- | # Syntax: Rscript [options] filename.R [arguments] | ||
- | Rscript --vanilla sweep.R $lambda $taskCount | ||
- | |||
- | date " | ||
- | </ | ||
- | |||
- | The '' | ||
- | There will be 200 array jobs all running the same script with different parameters (arguments). | ||
- | is used to prevent the multiple jobs from using the same disk space. | ||
- | |||
- | To run this in batch you must submit the job from the head node with the | ||
- | '' | ||
- | |||
- | < | ||
- | sbatch sweep.qs | ||
- | </ | ||
- | |||
- | And you see the notification of the job submitted, like this: | ||
- | |||
- | < | ||
- | Submitted batch job 1170728 | ||
- | </ | ||
- | |||
- | After the code completes the output of the script will appear in the files | ||
- | '' | ||
- | |||
- | If we look specifically at the array job output that maps to our previous example using '' | ||
- | < | ||
- | -- OpenMP job setup complete: | ||
- | -- OMP_THREAD_LIMIT | ||
- | -- OMP_PROC_BIND | ||
- | -- OMP_PLACES | ||
- | -- MP_BLIST | ||
- | |||
- | Adding package `r/3.5.1` to your environment | ||
- | Start 1567531210 | ||
- | Host r00n15.localdomain.hpc.udel.edu | ||
- | [1] 0.025 | ||
- | Finish 1567531210 | ||
- | </ | ||
- | <note tip> | ||
- | You will want to do more than just print out one fraction in your script. | ||
- | a one dimensional parameter sweep, to construct unique input and output file names for each task, | ||
- | or as a seed for the R Random Number Generator (RNG).</ | ||
- | |||
- | ==== Writing files from an array job ==== | ||
- | |||
- | You are running many jobs in the same directory. | ||
- | separate files with "dot taskid" | ||
- | |||
- | <note important> | ||
- | You need to make sure no two of your jobs will write to the same file. Look at your R script to see if you | ||
- | are writing files. | ||
- | If you are using these R functions, then use a unique file name constructed from the task id. | ||
- | </ | ||
- | |||
- | ==== vanilla option ==== | ||
- | |||
- | The command-line option '' | ||
- | be reading or writing to the same files. | ||
- | in the init-file '' | ||
- | them in your environ file '' | ||