software:r:r-sweep

Using R script in batch array job

Consider the simple script to print a fraction from the argument list

sweep.R
args <- commandArgs(trailingOnly = TRUE)
# print fraction from argument list 
as.numeric(args[1])/as.numeric(args[2])

This is a R script with can be run from the command line on a compute node the commands

qlogin
vpkg_require r/3
Rscript sweep.R 5 200

The output to the screen:

[1] 0.025

Consider the queue script file

sweep.qs
#$ -N sweep
#$ -t 1-200
## 
## Parameter sweep array job to run the sweep.R  with
##    lambda = 0,1,2. ... 199
##
 
# Add vpkg_require commands after this line:
vpkg_require r/3
 
date "+Start %s"
echo "Host $HOSTNAME"
 
let lambda="$SGE_TASK_ID-1"
let taskCount=200
 
# Syntax: Rscript [options] filename.R [arguments]
Rscript --vanilla sweep.R $lambda $taskCount
 
date "+Finish %s"

The date and echo Host lines are just a way of keeping track of when and where the jobs are run. There will be 200 array jobs all running the same script with different parameters (arguments). The –vanilla option is used to prevent the multiple jobs from using the same disk space.

To run this in batch you must submit the job from the head node with the qsub command.

qsub sweep.qs

After the code completes the output of the script will appear in the files sweep.o535064.1 to sweep.o535064.200. The number 535064 is the job ID assigned to your job when submitted, and 1 to 200 is the Task ID (e.g. corresponds to the -t 1-200)

Adding dependency `x11/RHEL6.1` to your environment
Adding package `r/3.0.2` to your environment
[1] 0.025
You will want to do more than just print out one fraction in your script. The integer parameter can be used for a one dimensional parameter sweep, to construct unique input and output file names for each task, or as a seed for the R Random Number Generator (RNG).

You are running many jobs in the same directory. Grid engine handles the standard output by writing to separate files with "dot taskid" appended to the jobid. You need to take care of other file output in your R script.

You need to make sure no two of your jobs will write to the same file. Look at your R script to see if you are writing files. Look for the sink command or any graphics writing commands such as pdf or png. If you are using these R functions, then use a unique file name constructed from the task id.

The command-line option –vanilla implies –no-site-file, –no-init-file and –no-environ. This way you will not be reading or writing to the same files. If you need initialization command, put them in your R script instead of in in the init-file .Rprofile. If you need some environment variables, export them in your bash script instead of assigning them in your environ file .Renviron.

  • software/r/r-sweep.txt
  • Last modified: 2017-10-23 22:05
  • by sraskar