software:r:farber

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:r:farber [2019-08-30 15:11] anitasoftware:r:farber [2021-03-17 14:44] (current) – [matmul.qs file] anita
Line 328: Line 328:
 Consider the queue submission script file: Consider the queue submission script file:
  
-<file matmul.qs>+<file bash matmul.qs>
 #$ -N matmultiply #$ -N matmultiply
  
Line 369: Line 369:
 </code> </code>
  
 +====== Using R script in batch array job ======
 +===== sweep.R file =====
 +
 +Consider the simple script to print a fraction from the argument list
 +
 +<file R sweep.R>
 +args <- commandArgs(trailingOnly = TRUE)
 +# print fraction from argument list 
 +as.numeric(args[1])/as.numeric(args[2])
 +</file>
 +
 +This is a R script with can be run from the command line on a compute node the commands
 +
 +<code bash>
 +qlogin
 +vpkg_require r/3
 +Rscript sweep.R 5 200
 +</code>
 +
 +The output to the screen:
 +<code>
 +[1] 0.025
 +</code>
 +
 +===== sweep.qs file =====
 +
 +Consider the queue script file
 +
 +<file bash sweep.qs>
 +#$ -N sweep
 +#$ -t 1-200
 +## 
 +## Parameter sweep array job to run the sweep.R  with
 +##    lambda = 0,1,2. ... 199
 +##
 +
 +# Add vpkg_require commands after this line:
 +vpkg_require r/3
 +
 +date "+Start %s"
 +echo "Host $HOSTNAME"
 +
 +let lambda="$SGE_TASK_ID-1"
 +let taskCount=200
 +
 +# Syntax: Rscript [options] filename.R [arguments]
 +Rscript --vanilla sweep.R $lambda $taskCount
 +
 +date "+Finish %s"
 +</file>
 +
 +The ''date'' and ''echo Host'' lines are just a way of keeping track of when and where the jobs are run.
 +There will be 200 array jobs all running the same script with different parameters (arguments).  The ''--vanilla'' option
 +is used to prevent the multiple jobs from using the same disk space.
 +
 +To run this in batch you must submit the job from the head node with the
 +''qsub'' command.
 +
 +<code>
 +qsub sweep.qs
 +</code>
 +
 +After the code completes the output of the script will appear in the files
 +''sweep.o535064.1'' to ''sweep.o535064.200''. The number 535064 is the job ID assigned to your job when submitted, and 1 to 200 is the Task ID (e.g. corresponds to the ''-t 1-200'')
 +
 +<code>
 +Adding dependency `x11/RHEL6.1` to your environment
 +Adding package `r/3.0.2` to your environment
 +[1] 0.025
 +</code>
 +<note tip>
 +You will want to do more than just print out one fraction in your script.  The integer parameter can be used for
 +a one dimensional parameter sweep, to construct unique input and output file names for each task, 
 +or as a seed for the R Random Number Generator (RNG).</note>
 +
 +==== Writing files from an array job ====
 +
 +You are running many jobs in the same directory.  Grid engine handles the standard output by writing to
 +separate files with "dot taskid" appended to the jobid.  You need to take care of other file output in your R script.
 +
 +<note important>
 +You need to make sure no two of your jobs will write to the same file.  Look at your R script to see if you
 +are writing files.  Look for the ''**sink**'' command or any graphics writing commands such as ''**pdf**'' or ''**png**''.
 +If you are using these R functions, then use a unique file name constructed from the task id.
 +</note>
 +
 +==== vanilla option ====
 +
 +The command-line option ''--vanilla'' implies --no-site-file, --no-init-file and --no-environ.  This way you will not
 +be reading or writing to the same files.  If you need initialization command, put them in your R script instead of in
 +in the init-file ''.Rprofile'' If you need some environment variables, export them in your bash script instead of assigning
 +them in your environ file ''.Renviron''.
  
  • software/r/farber.1567192293.txt.gz
  • Last modified: 2019-08-30 15:11
  • by anita