software:r:farber

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:r:farber [2018-04-26 13:23] – [personal/program specific R libraries and extensions] sraskarsoftware:r:farber [2021-03-17 14:44] (current) – [matmul.qs file] anita
Line 288: Line 288:
 cuda 6.5 version into ''$WORKDIR/sw/r/add-ons/r3.1.1/testing/default-cuda-6.5''. cuda 6.5 version into ''$WORKDIR/sw/r/add-ons/r3.1.1/testing/default-cuda-6.5''.
  
 +====== R script in batch ======
 +
 +==== matmul.R script ====
 +
 +Consider the simple R script file to multiply a small 3x3 matrix
 +
 +<file R matmul.R>
 +# Calculate and print small matrix AA'
 +a <- matrix(1:12,3,4);
 +a%*%t(a)
 +</file>
 +
 +Let's test this R script using ''Rscript'' from the command line on a compute node.  Don't forget to set your [[general/userguide/04_compute_environ?&#using-workgroup-and-directories|workgroup]] to define your cluster group or //investing-entity// compute nodes before you use ''qlogin'' to get on a compute node. For example,
 +
 +<code bash>
 +workgroup -g it_css
 +qlogin
 +vpkg_require r/3
 +Rscript matmul.R
 +</code>
 +
 +The output to the screen:
 +
 +<code>
 +     [,1] [,2] [,3]
 +[1,]  166  188  210
 +[2,]  188  214  240
 +[3,]  210  240  270
 +</code>
 +
 +To return to the head node, type
 +<code bash>
 +exit
 +</code>
 +
 +==== matmul.qs file ====
 +
 +To run a R script in batch instead of on the command line has nearly the same steps.
 +Consider the queue submission script file:
 +
 +<file bash matmul.qs>
 +#$ -N matmultiply
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript matmul.R 
 +</file>
 +
 +Now to run the R script simply submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub matmul.qs
 +</code>
 +
 +You should see a notification that your job was submitted.  Something like this
 +
 +<code bash>
 +Your job 2283886 ("matmultiply") has been submitted
 +</code>
 +
 +After the code completes the output of the script will appear in the file
 +''matmultiply.o2283886'' because ''-N matmultiply'' defines the name of the job in ''matmul.qs'' and appears in the notification above as ''("matmultiply")'' with ''2283886'' assigned as the job ID. Type 
 +
 +<code>
 +more matmultiply.o2283886
 +</code>
 +
 +to display the contents of the output file on the screen.  For example,
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +     [,1] [,2] [,3]
 +[1,]  166  188  210
 +[2,]  188  214  240
 +[3,]  210  240  270
 +</code>
 +
 +====== Using R script in batch array job ======
 +===== sweep.R file =====
 +
 +Consider the simple script to print a fraction from the argument list
 +
 +<file R sweep.R>
 +args <- commandArgs(trailingOnly = TRUE)
 +# print fraction from argument list 
 +as.numeric(args[1])/as.numeric(args[2])
 +</file>
 +
 +This is a R script with can be run from the command line on a compute node the commands
 +
 +<code bash>
 +qlogin
 +vpkg_require r/3
 +Rscript sweep.R 5 200
 +</code>
 +
 +The output to the screen:
 +<code>
 +[1] 0.025
 +</code>
 +
 +===== sweep.qs file =====
 +
 +Consider the queue script file
 +
 +<file bash sweep.qs>
 +#$ -N sweep
 +#$ -t 1-200
 +## 
 +## Parameter sweep array job to run the sweep.R  with
 +##    lambda = 0,1,2. ... 199
 +##
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +date "+Start %s"
 +echo "Host $HOSTNAME"
 +
 +let lambda="$SGE_TASK_ID-1"
 +let taskCount=200
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript --vanilla sweep.R $lambda $taskCount
 +
 +date "+Finish %s"
 +</file>
 +
 +The ''date'' and ''echo Host'' lines are just a way of keeping track of when and where the jobs are run.
 +There will be 200 array jobs all running the same script with different parameters (arguments).  The ''--vanilla'' option
 +is used to prevent the multiple jobs from using the same disk space.
 +
 +To run this in batch you must submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub sweep.qs
 +</code>
 +
 +After the code completes the output of the script will appear in the files
 +''sweep.o535064.1'' to ''sweep.o535064.200''. The number 535064 is the job ID assigned to your job when submitted, and 1 to 200 is the Task ID (e.g. corresponds to the ''-t 1-200'')
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +[1] 0.025
 +</code>
 +<note tip>
 +You will want to do more than just print out one fraction in your script.  The integer parameter can be used for
 +a one dimensional parameter sweep, to construct unique input and output file names for each task, 
 +or as a seed for the R Random Number Generator (RNG).</note>
 +
 +==== Writing files from an array job ====
 +
 +You are running many jobs in the same directory.  Grid engine handles the standard output by writing to
 +separate files with "dot taskid" appended to the jobid.  You need to take care of other file output in your R script.
 +
 +<note important>
 +You need to make sure no two of your jobs will write to the same file.  Look at your R script to see if you
 +are writing files.  Look for the ''**sink**'' command or any graphics writing commands such as ''**pdf**'' or ''**png**''.
 +If you are using these R functions, then use a unique file name constructed from the task id.
 +</note>
 +
 +==== vanilla option ====
 +
 +The command-line option ''--vanilla'' implies --no-site-file, --no-init-file and --no-environ.  This way you will not
 +be reading or writing to the same files.  If you need initialization command, put them in your R script instead of in
 +in the init-file ''.Rprofile'' If you need some environment variables, export them in your bash script instead of assigning
 +them in your environ file ''.Renviron''.
  
  • software/r/farber.1524763390.txt.gz
  • Last modified: 2018-04-26 13:23
  • by sraskar