software:r:farber

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:r:farber [2018-04-26 13:20] – [R libraries and extensions] sraskarsoftware:r:farber [2021-03-17 14:44] (current) – [matmul.qs file] anita
Line 245: Line 245:
  
 === Using IT's udbuild environment === === Using IT's udbuild environment ===
-IT developed a formalization for installing modules called [[farber:udbuild]]+IT developed a formalization for installing modules called [[/abstract/farber/install_software|udbuild]]
 which can simplify the installation of modules.  Here is an example ''udbuild'' which can simplify the installation of modules.  Here is an example ''udbuild''
 script which can be used to install a personal R library. script which can be used to install a personal R library.
Line 288: Line 288:
 cuda 6.5 version into ''$WORKDIR/sw/r/add-ons/r3.1.1/testing/default-cuda-6.5''. cuda 6.5 version into ''$WORKDIR/sw/r/add-ons/r3.1.1/testing/default-cuda-6.5''.
  
 +====== R script in batch ======
 +
 +==== matmul.R script ====
 +
 +Consider the simple R script file to multiply a small 3x3 matrix
 +
 +<file R matmul.R>
 +# Calculate and print small matrix AA'
 +a <- matrix(1:12,3,4);
 +a%*%t(a)
 +</file>
 +
 +Let's test this R script using ''Rscript'' from the command line on a compute node.  Don't forget to set your [[general/userguide/04_compute_environ?&#using-workgroup-and-directories|workgroup]] to define your cluster group or //investing-entity// compute nodes before you use ''qlogin'' to get on a compute node. For example,
 +
 +<code bash>
 +workgroup -g it_css
 +qlogin
 +vpkg_require r/3
 +Rscript matmul.R
 +</code>
 +
 +The output to the screen:
 +
 +<code>
 +     [,1] [,2] [,3]
 +[1,]  166  188  210
 +[2,]  188  214  240
 +[3,]  210  240  270
 +</code>
 +
 +To return to the head node, type
 +<code bash>
 +exit
 +</code>
 +
 +==== matmul.qs file ====
 +
 +To run a R script in batch instead of on the command line has nearly the same steps.
 +Consider the queue submission script file:
 +
 +<file bash matmul.qs>
 +#$ -N matmultiply
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript matmul.R 
 +</file>
 +
 +Now to run the R script simply submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub matmul.qs
 +</code>
 +
 +You should see a notification that your job was submitted.  Something like this
 +
 +<code bash>
 +Your job 2283886 ("matmultiply") has been submitted
 +</code>
 +
 +After the code completes the output of the script will appear in the file
 +''matmultiply.o2283886'' because ''-N matmultiply'' defines the name of the job in ''matmul.qs'' and appears in the notification above as ''("matmultiply")'' with ''2283886'' assigned as the job ID. Type 
 +
 +<code>
 +more matmultiply.o2283886
 +</code>
 +
 +to display the contents of the output file on the screen.  For example,
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +     [,1] [,2] [,3]
 +[1,]  166  188  210
 +[2,]  188  214  240
 +[3,]  210  240  270
 +</code>
 +
 +====== Using R script in batch array job ======
 +===== sweep.R file =====
 +
 +Consider the simple script to print a fraction from the argument list
 +
 +<file R sweep.R>
 +args <- commandArgs(trailingOnly = TRUE)
 +# print fraction from argument list 
 +as.numeric(args[1])/as.numeric(args[2])
 +</file>
 +
 +This is a R script with can be run from the command line on a compute node the commands
 +
 +<code bash>
 +qlogin
 +vpkg_require r/3
 +Rscript sweep.R 5 200
 +</code>
 +
 +The output to the screen:
 +<code>
 +[1] 0.025
 +</code>
 +
 +===== sweep.qs file =====
 +
 +Consider the queue script file
 +
 +<file bash sweep.qs>
 +#$ -N sweep
 +#$ -t 1-200
 +## 
 +## Parameter sweep array job to run the sweep.R  with
 +##    lambda = 0,1,2. ... 199
 +##
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +date "+Start %s"
 +echo "Host $HOSTNAME"
 +
 +let lambda="$SGE_TASK_ID-1"
 +let taskCount=200
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript --vanilla sweep.R $lambda $taskCount
 +
 +date "+Finish %s"
 +</file>
 +
 +The ''date'' and ''echo Host'' lines are just a way of keeping track of when and where the jobs are run.
 +There will be 200 array jobs all running the same script with different parameters (arguments).  The ''--vanilla'' option
 +is used to prevent the multiple jobs from using the same disk space.
 +
 +To run this in batch you must submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub sweep.qs
 +</code>
 +
 +After the code completes the output of the script will appear in the files
 +''sweep.o535064.1'' to ''sweep.o535064.200''. The number 535064 is the job ID assigned to your job when submitted, and 1 to 200 is the Task ID (e.g. corresponds to the ''-t 1-200'')
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +[1] 0.025
 +</code>
 +<note tip>
 +You will want to do more than just print out one fraction in your script.  The integer parameter can be used for
 +a one dimensional parameter sweep, to construct unique input and output file names for each task, 
 +or as a seed for the R Random Number Generator (RNG).</note>
 +
 +==== Writing files from an array job ====
 +
 +You are running many jobs in the same directory.  Grid engine handles the standard output by writing to
 +separate files with "dot taskid" appended to the jobid.  You need to take care of other file output in your R script.
 +
 +<note important>
 +You need to make sure no two of your jobs will write to the same file.  Look at your R script to see if you
 +are writing files.  Look for the ''**sink**'' command or any graphics writing commands such as ''**pdf**'' or ''**png**''.
 +If you are using these R functions, then use a unique file name constructed from the task id.
 +</note>
 +
 +==== vanilla option ====
 +
 +The command-line option ''--vanilla'' implies --no-site-file, --no-init-file and --no-environ.  This way you will not
 +be reading or writing to the same files.  If you need initialization command, put them in your R script instead of in
 +in the init-file ''.Rprofile'' If you need some environment variables, export them in your bash script instead of assigning
 +them in your environ file ''.Renviron''.
  
  • software/r/farber.1524763234.txt.gz
  • Last modified: 2018-04-26 13:20
  • by sraskar