====== Matlab on Mills ======
On Mills, MATLAB projects should be developed using a Desktop installation of MATLAB and then copied to the cluster
to be run in batch. Here we
consider an extended MATLAB example involving two simple MATLAB functions, and two MATLAB scripts to execute
the first function in a loop, and to using the Parallel Toolbox.
Details on how to run these two scripts in batch are given with the resulting output files. There is also a
section with UNIX commands you can use to watch your jobs and gather [[#timings-and-core-count | timing and core count]] numbers.
You will need to know how much memory and how many cores you should request for your jobs.
Even though it easier to develop on a desktop, MATLAB can be run interactively on Farber.
Two interactive jobs are demonstrated. One shows how to test the function by executing the function one time. A
second example shows an interactive session, which starts multiple MATLAB pool of workers to execute the function as a parallel toolbox loop, **''parfor''**.
The Parallel toolbox gives a faster time to completion, but with more memory and CPU resources consumed.
Many MATLAB research projects fall in the the "high throughput computing" category. One run can be done on the desktop, but it is desired complete 100s or 1000s of independent runs. This greatly increases disk, memory and CPU requirements.
Thus we have a
final example that gives the recommended workflow to scale your job to multiple nodes. The second MATLAB function has added logic to enabled it to be deployed as a compiled MATLAB code. Then it is deployed as a single threaded grid engine array job.
The MATLAB distributed computing server (MDCS) is not installed on Mills. This means jobs run with the Parallel Computing toolbox can only run on one node. This limits both the size of the job and the number of workers you can use. That is why
an array job of compiled MATLAB is recommended for large jobs.
==== Matlab License Information ====
Matlab licenses are pushed into consumable (global, per-job) integer complexes in Grid Engine and can be checked using
qhost -h global -F
to list number of unused license seats for each product.
Below is an example representing a snapshot of unused licensed seats for Matlab products on the cluster.
[traine@mills ~]$ qhost -h global -F
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
gc:MLM.Compiler=50.000000
gc:MLM.Aerospace_Blockset=1.000000
gc:MLM.RTW_Embedded_Coder=2.000000
gc:MLM.Robust_Toolbox=150.000000
gc:MLM.Aerospace_Toolbox=1.000000
gc:MLM.Identification_Toolbox=50.000000
gc:MLM.XPC_Target=2.000000
gc:MLM.Econometrics_Toolbox=1.000000
gc:MLM.Real-Time_Workshop=2.000000
gc:MLM.Fuzzy_Toolbox=50.000000
gc:MLM.Video_and_Image_Blockset=1.000000
gc:MLM.Neural_Network_Toolbox=50.000000
gc:MLM.Fin_Instruments_Toolbox=1.000000
gc:MLM.Optimization_Toolbox=44.000000
gc:MLM.MATLAB_Coder=2.000000
gc:MLM.MATLAB=204.000000
gc:MLM.Database_Toolbox=1.000000
gc:MLM.SIMULINK=100.000000
gc:MLM.PDE_Toolbox=48.000000
gc:MLM.GADS_Toolbox=1.000000
gc:MLM.Symbolic_Toolbox=46.000000
gc:MLM.Signal_Toolbox=146.000000
gc:MLM.Financial_Toolbox=1.000000
gc:MLM.Data_Acq_Toolbox=2.000000
gc:MLM.Image_Acquisition_Toolbox=1.000000
gc:MLM.Curve_Fitting_Toolbox=9.000000
gc:MLM.Image_Toolbox=143.000000
gc:MLM.Distrib_Computing_Toolbox=48.000000
gc:MLM.OPC_Toolbox=1.000000
gc:MLM.MPC_Toolbox=50.000000
gc:MLM.Virtual_Reality_Toolbox=1.000000
gc:MLM.Statistics_Toolbox=43.000000
gc:MLM.Signal_Blocks=50.000000
gc:MLM.Instr_Control_Toolbox=2.000000
gc:MLM.MAP_Toolbox=12.000000
gc:MLM.Communication_Toolbox=50.000000
gc:MLM.Control_Toolbox=150.000000
gc:MLM.Wavelet_Toolbox=1.000000
gc:MLM.Bioinformatics_Toolbox=1.000000
gc:MLM.Simulink_Control_Design=50.000000
gc:MLM.Real-Time_Win_Target=1.000000
Matlab jobs can be submitted to require a certain number of license seats to be available before a job will run. If there are inter-license dependencies for toolboxes, then you should specify all the licenses including Matlab and/or Simulink.
For example, if a Matlab job requires the Financial toolbox, then you will also need to specify all the inter-related toolbox licenses required by the Financial toolbox such as the Statistics and Optimization toolboxes as well Matlab itself. See [[http://www.mathworks.com/products/availability|Mathworks System Requirements & Platform Availability by Product
]] for complete details.
qsub -l MLM.MATLAB=1,MLM.Financial_Toolbox=1,MLM.Statistics_Toolbox=1,MLM.Optimization_Toolbox=1 ...
Naturally, this isn't a to-the-moment mapping because the license server is not being queried constantly. However, it's consumable, so it is keeping track of how many seats are unused every 6 minutes.
This will be most helpful when submitting many Matlab jobs that require a toolbox with a low-seat count. They will wait for a toolbox seat to become available rather than trying to run and having many getting the "**License checkout failed**" message from MATLAB.
===== Matlab function =====
Sample Matlab function:
% maxEig Maximum Eigenvalue of a random matrix
% Input parameters
% sd - seed for uniform random generator
% dim - size of the square matrix (should be odd)
% Output value
% maxe - maximum real eigenvalue
function maxe = maxEig(sd,dim)
rng(sd);
ev = eig( randn(dim) );
maxe = max( ev(imag(ev)==0) )
end
The examples will be using a Matlab function to illustrate using Matlab in batch and interactively. The function will be run on multiple cores using multiple computational threads, and 12 workers from a Matlab pool. Finally it will be compiled and deployed.
==== Matlab script ====
First, write a Matlab script file. It should have a comment on the first line describing the purpose of the script and have the ''quit'' command on the last line. This script will call the [[#matlab-function|maxEig function]] 200 times and report the average:
% script to run maxEig function 200 times
count = 200;
dim = 5001;
sumMaxe = 0;
tic;
for i=1:count;
sumMaxe = sumMaxe + maxEig(i,dim);
end;
toc
avgMaxEig = sumMaxe/count
quit
This is a detailed script example, which calls the maxEig function. This example does no file I/O, all the I/O is to standard out. In Matlab, assignments, not terminated by a semicolon, are display on the screen (standard out in batch).
Several MATLAB commands could be added to the beginning of this script to set the maximum number of computational threads to the number of slots assigned to your job. If the scheduler using CGROUPS to limit your job core count, then these commands are not necessary.
[compThreads,count]=sscanf(getenv('NSLOTS'),'%d');
if count == 1
warning('off','MATLAB:maxNumCompThreads:Deprecated');
autoCompThreads = maxNumCompThreads(compThreads);
disp(sprintf('NumCompThreads=%d, was %d',compThreads,autoCompThreads))
end
See [[maxNumCompThreadsGridEngine|Setting maximum number of computational threads]]
This script ends in a **__quit__** command (equivalent to MATLAB **__exit__**). This is meant to be a complete script, which
terminates MATLAB when done. If you run this from the bash command line with the ''-r script'' option, it will come back with a bash prompt when completed. If this is run from a batch job, then you can do other commands in your batch script after the MATLAB script completes.
Without the **__quit__** you will come back to the MATLAB prompt on completion for a interactive job. If this is the last line of a batch queue script, then the only difference will be the MATLAB prompt ''>>'' at the very end of the output file. MATLAB treats the end of batch script file the same as exiting the window, which is the preferred way to exit the MATLAB GUI.
==== Grid Engine script ====
Second, write a shell script file to set the Matlab environment and start Matlab running your script file. The following script file will set the Matlab environment and run the command in the [[#matlab-script|script.m]] file:
#$ -N script-simple.m
#$ -m ea
#$ -M traine@gmail.com
#$ -pe threads 12
vpkg_require matlab/r2013a
matlab -nodisplay -nojvm -r script
#$ -N script-catcherr.m
#$ -m ea
#$ -M traine@gmail.com
#$ -pe threads 12
vpkg_require matlab/r2013a
matlab -nodisplay -nojvm -r 'try; script; catch err; disp(err.getReport); end'
Ths ''-nodisplay'' indicates no X11 graphics, which implies no introductory splash panel ''-nosplash''. The ''-nojvm'' indicates no Java. Java is needed for print graphics, but should be excluded for must computational jobs.
The ''-r'' is followed by a list of Matlab commands, enclosed in quotes.
**Errors in the Matlab script**
This list of commands is the ''try/catch'' block enclose one command. The command ''script'' will execute the lines in the ''script.m'' file. The purpose of the ''try/catch'' is to handle the possibility that
there is an error in the script. For some errors Matlab will display the error message and wait a respond. Clearly not appropriate for a batch job.
**Graphics in the Matlab script**
* Do not include the ''-nojvm'' on the **matlab** command.
* Do set paper dimensions and print each figure to a file.
The text output will be included in the standard Grid Engine output file, but not any graphics. All figures must be exported using the **print** command. Normally the **print** command will print on an 8 1/2 by 11 inch page with margins that are for a printed page of paper. The size and margins will not work if you plan to include the figure in a paper or a web page.
We suggest setting the current figure's ''PaperUnits'', ''PaperSize'' and ''PaperPosition''. Matlab provides a handle to the current figure (**gcf**). For example, the commands
set(gcf,'PaperUnits','inches','PaperSize',[4,3],'PaperPosition',[0 0 4 3]);
print('-dpng','-r100','maxe.png');
will set the current figure to be 4 x 3 inches with no margins, and then print the figure as a 400x300 resolution ''png'' file.
==== Submit job ====
Third, from the directory with both ''script.m'' and either ''batch-simple.qs'' or ''batch-catcherr.qs'', submit the batch job with the command:
qsub batch-simple.qs
or
qsub batch-catcherr.qs
See [[#matlab-license-information|Matlab licenses]] for details on specifying resources if there are limited number of license seats available for particular toolboxes.
In this example you will need a license for the base Matlab, and the parallel toolbox needs one license. We are using the default local scheduler which give you 12 workers on the same node with one license.
==== Wait for completion ====
Finally, wait for the mail notification, which will be sent to ''traine@gmail.com''. When the job is done the output from the Matlab command will be in a file with the pattern - ''script.m.oJOBID'', where JOBID is the number assigned to your job.
After waiting for about 5 hours, a message was receive with subject line "Grid Engine Job Scheduler":
Job 236214 (script.m) Complete
User = traine
Queue = it_css.q+@n017
Host = n017.mills.hpc.udel.edu
Start Time = 03/27/2013 16:17:30
End Time = 03/27/2013 21:11:46
User Time = 1:10:47:28
System Time = 00:30:04
Wallclock Time = 04:54:16
CPU = 1:11:17:33
Max vmem = 1.976G
Exit Status = 0
==== Gather results ====
The results for Job 236214 are in the file
Adding dependency `x11/RHEL6.1` to your environment
Adding package `matlab/r2013a` to your environment
< M A T L A B (R) >
Copyright 1984-2013 The MathWorks, Inc.
R2013a (8.1.0.604) 64-bit (glnxa64)
February 15, 2013
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
maxe =
70.0220
''' //Skipping 198 similar displays of variablemaxe//
maxe =
67.4221
Elapsed time is 17651.125420 seconds.
avgMaxEig =
69.5131
==== Timing and core count ====
While the batch job was running on node ''n017'', the top command was run to sample the resources being used by Matlab
every second two times ''-b -d 1 -n 2''. This ''-H'' option was used to display each individual thread, rather than a summery of all threads in a process.
$ ssh n017 top -H -b -d 1 -n 2 | egrep '(COMMAND|MATLAB)' | grep -v 'S 0'
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2158 traine 20 0 2011m 557m 54m R 101.6 0.9 30:19.69 MATLAB
2166 traine 20 0 2011m 557m 54m R 72.9 0.9 16:53.48 MATLAB
2167 traine 20 0 2011m 557m 54m R 72.9 0.9 16:53.28 MATLAB
2171 traine 20 0 2011m 557m 54m R 72.9 0.9 16:53.08 MATLAB
2173 traine 20 0 2011m 557m 54m R 72.9 0.9 16:53.12 MATLAB
2174 traine 20 0 2011m 557m 54m R 72.9 0.9 16:52.85 MATLAB
2175 traine 20 0 2011m 557m 54m R 72.9 0.9 16:52.98 MATLAB
2176 traine 20 0 2011m 557m 54m R 72.9 0.9 16:52.86 MATLAB
2168 traine 20 0 2011m 557m 54m R 70.9 0.9 16:52.88 MATLAB
2169 traine 20 0 2011m 557m 54m R 70.9 0.9 16:52.83 MATLAB
2170 traine 20 0 2011m 557m 54m R 70.9 0.9 16:52.93 MATLAB
2172 traine 20 0 2011m 557m 54m R 70.9 0.9 16:52.70 MATLAB
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
2158 traine 20 0 2011m 557m 54m R 99.2 0.9 30:20.71 MATLAB
2172 traine 20 0 2011m 557m 54m R 83.7 0.9 16:53.56 MATLAB
2166 traine 20 0 2011m 557m 54m R 82.7 0.9 16:54.33 MATLAB
2167 traine 20 0 2011m 557m 54m R 82.7 0.9 16:54.13 MATLAB
2168 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.73 MATLAB
2169 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.68 MATLAB
2170 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.78 MATLAB
2171 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.93 MATLAB
2173 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.97 MATLAB
2174 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.70 MATLAB
2176 traine 20 0 2011m 557m 54m R 82.7 0.9 16:53.71 MATLAB
2175 traine 20 0 2011m 557m 54m R 81.7 0.9 16:53.82 MATLAB
$ ssh n017 mpstat -P ALL 1 2
Linux 2.6.32-279.19.1.el6.x86_64 (n017) 03/27/2013 _x86_64_ (24 CPU)
04:46:56 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
04:46:57 PM all 47.61 0.00 0.96 0.00 0.00 0.00 0.00 0.00 51.44
04:46:57 PM 0 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 1 8.08 0.00 0.00 0.00 0.00 0.00 0.00 0.00 91.92
04:46:57 PM 2 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 4 0.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 99.00
04:46:57 PM 5 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 6 93.07 0.00 2.97 0.00 0.00 0.00 0.00 0.00 3.96
04:46:57 PM 7 16.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 84.00
04:46:57 PM 8 75.25 0.00 0.00 0.00 0.00 0.00 0.00 0.00 24.75
04:46:57 PM 9 94.00 0.00 3.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 10 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 13 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 15 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 17 94.95 0.00 2.02 0.00 0.00 0.00 0.00 0.00 3.03
04:46:57 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 19 96.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 21 94.12 0.00 1.96 0.00 0.00 0.00 0.00 0.00 3.92
04:46:57 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:57 PM 23 95.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 3.00
04:46:57 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
04:46:58 PM all 49.35 0.00 0.67 0.00 0.00 0.00 0.00 0.00 49.98
04:46:58 PM 0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 1 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 2 98.99 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 3 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 4 0.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 98.00
04:46:58 PM 5 98.99 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 6 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 7 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 9 98.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 10 98.02 0.00 1.98 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 11 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 13 98.99 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 15 99.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 17 99.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 19 98.99 0.00 1.01 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 20 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 21 98.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 0.00
04:46:58 PM 22 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
04:46:58 PM 23 98.02 0.00 0.99 0.00 0.00 0.00 0.00 0.00 0.99
==Conclusions==
The main computation task for this batch job is finding all 5001 eigenvalues. This is done by the Matlab built-in function ''ev''. The Matlab built-in matrix function call using MKL, which is a highly optimized multi-threaded library. It uses what is called dynamic scheduling to using as many cores as needed for the application at hand. In this snapshot it appears as if there is one main thread and 11 other threads that are called on for vector operations (BLAS). The 11 threads have similar times that means they being using as data parallel operations (they are all doing the same operations on different parts of the vector.
We have the correct Grid Engine options ''-pe threads 12'' for this case.
[[software:matlab:mills:matlab-two-samenode|Two Matlab interactive jobs on same node]]
===== Batch parallel example =====
The Matlab parallel toolbox uses JVM to manage the workers and communicate while you are running. You
need to setup the Matlab pools in your ''script''.
==== Matlab script ====
Here are the slightly modified MATLAB script and qsub script files. These can be used to do the same task with 12 Matlab workers.
Add two ''matlabpool'' commands and change ''for'' ⇒ ''parfor''.
% script to run maxEig function 200 times
mypool=parpool(20);
count = 200;
dim = 5001;
sumMaxe = 0;
tic;
parfor i=1:count;
sumMaxe = sumMaxe + maxEig(i,dim);
end;
toc
avgMaxEig = sumMaxe/count
delete(mypool);
quit
==== Grid engine script ====
Take out ''-nojvm'', which is needed for the matlabpool.
#$ -N pscript
#$ -m ea
#$ -M traine@gmail.com
#$ -l m_mem_free=3.1G
#$ -l MLM.Distrib_Computing_Toolbox=1
#$ -pe threads 20
vpkg_require matlab/r2015a
ls
matlab -nodisplay -r 'pscript'
===== Interactive example =====
This example is based on being in your workgroup environment, cd'ing to your MATLAB project directory and starting an interactive session on a compute node.
==== Scheduling exclusive interactive job ====
$ qlogin -l exclusive=1
Your job 2493 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 2493 has been successfully scheduled.
Establishing /opt/shared/univa/local/qlogin_ssh session to host n036 ...
==== Starting a command mode matlab session ====
$ vpkg_require matlab/r2014b
Adding package `matlab/r2014b` to your environment
$ matlab -nodesktop -nosplash
MATLAB is selecting SOFTWARE OPENGL rendering.
< M A T L A B (R) >
Copyright 1984-2014 The MathWorks, Inc.
R2014b (8.4.0.150421) 64-bit (glnxa64)
September 15, 2014
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
==== Using help as the first command ====
>> help maxEig
maxEig Maximum Eigenvalue of a random matrix
Input parameters
sd - seed for uniform random generator
dim - size of the square matrix (should be odd)
Output value
maxe - maximum real eigvalue
==== Calling function once ====
Use the tic and toc commands to report the elapsed time to generate the random matrix, find all eigenvalues and report the maximum real eigvalue.
>> tic; maxEig(1,5001); toc
maxe =
70.0220
Elapsed time is 86.564581 seconds.
==== Finishing up ====
>> exit
$ exit
Connection to n036 closed.
/opt/shared/univa/local/qlogin_ssh exited with exit code 0
===== Interactive parallel toolkit example =====
When you plan to use the parallel toolbox, you should logon the compute node with the command
qlogin -pe threads 12
This will effectively reserve 12 slots for your Matlab workers. The is the default, and maximum allowed value for the current version of Matlab (r2013a).
Same example function called from 12 matlab workers (the maximum allowed for the local profile.)
Here we start 12 workers with the matlabpool, and then use parfor to send a different seed to each worker. It is important that the workers do not do the same computations. The output is from the workers, in the order they complete. The order is not deterministic.
It took about 150 seconds to start all 12 workers, and this is no reflected in the total elapsed time.
>> matlabpool(12);
Starting matlabpool using the 'local' profile ... connected to 12 workers.
>> tic; parfor sd = 1:12; maxEig(sd,5001); end; toc
maxe =
70.8182
maxe =
70.4677
maxe =
70.2538
maxe =
62.0506
maxe =
70.9143
maxe =
66.4947
maxe =
70.5141
maxe =
70.8709
maxe =
69.1104
maxe =
63.9266
maxe =
70.6306
maxe =
70.4674
Elapsed time is 207.531791 seconds.
===== MCR (Matlab Compiler Runtime) array job =====
Most Matlab functions can be compiled using the Matlab Compiler (MCC) and then deployed to run on the compute nodes in the MATLAB Compiler Runtime (MCR). The MCR is a prerequisite for deployment, and is installed on all the compute nodes. You must use VALET to set up the libraries you will need to run your function from the command line. You do not need to use the shell (''.sh'' file) that the compiler creates.
There are two ways to run compiled MATLAB jobs in a shared environment, such as Mills.
- Compile to produce and executable that uses a single computational thread - MATLAB option '-singleCompThread'
- Submit the job to use the nodes exclusively - Grid engine option ''-l exclusive=1''
You can run more jobs on each node when they compiled to use just one core (Single Comp Thread). This will give
you higher throughput for an array job, but not higher performance.
==== Example function ====
As an example, consider the Matlab function with one parameter:
function maxe = maxEig(lam)
if (isdeployed)
lam=str2num(lam);
end
dim=10000;
M=rand(dim);
D=diag(rand(dim,1));
maxe = max(eig( M + lam*D ))
end
The MATLAB function ''maxEig'' takes one numeric parameter as a argument (''lam''), constructs a random matrix in the form ''M + lam*D'' and then assigns ''maxe'' to the largest eigenvalue of the matrix.
We should get different ''maxe'' values dependent on ''lam''.
==== Example compiler commands ====
We can convert this function into a single computational executable by using the Matlab compiler ''mcc''. Type the commands
prog=maxEig
opt='-nojvm,-nodisplay,-singleCompThread'
version='r2012b'
vpkg_require matlab/$version
mcc -R "$opt" -mv $prog.m
**Keep these commands in a file**: Even though this is just two commands, we recommend you keep these commands, including the shell assignment statements, as a record of the MATLAB version and options you used to create the executable ''maxEig''. You will need to know these if you want to use the executable in a shell script. You can source this file when you want to rebuild ''maxEig''
**You can get mcc usage instructions with ''mcc -help''**: The string following the ''-R'' flag are the Matlab
options you want to use at run time. The ''-m'' option tell mcc to build a standalone application to be deployed
using MCR. The ''-v'' option is for verbose mode.
==== Example queue script file ====
The ''mcc'' command will give you a ''.sh'', which you can use to setup you environment and run the command. This does not use VALET and does not have the grid engine commands in it. We such you use the following ''.qs'' file as a example to modify for own job.
#$ -N maxEig
#$ -t 1-200
#
# Parameter sweep array job to run the maxEig compiled MATLAB function with
# lambda = 1,2. ... 200
#
date "+Start %s"
echo "Host $HOSTNAME"
let lambda="$SGE_TASK_ID-1"
source /opt/shared/valet/docs/valet.sh
vpkg_require mcr/r2012b-nojvm
export MCR_CACHE_ROOT=$TMPDIR
./maxEig $lambda
date "+Finish %s"
The two ''date'' commands with record the start and finish time in seconds for each task. These can be used to compute the elapsed time, and the echoed host name can be used to calculate the overlapping use of the computer nodes.
==== Compiled MATLAB in owner queues ====
To test the example compiled Matlab job on the three ''it_css'' owner queues, we first will modify the first few lines of queue script file:
#$ -t 1-6
#$ -l exclusive=1
This will do 6 tasks with exclusive access to the node. That is, each node will do 2 tasks consecutively.
Next, we compile two copies of the program
- multiply computational threads (default).
- Single computional thread (-SingleCompThread)
**Times for two runs**
Multiple computational threads used a total of 16929.669 CPU seconds over 2226 seconds of elapsed time
on 3 nodes
^ Array job started Tue 26 Feb 2013 07:59:48 PM EST ^^^^^^
^ Node ^^ Real Clock Time ^^^ Ratio ^
^ Name ^ Count ^ Min ^ Max ^ Average ^ User/Real ^
| n015| 2| 1073.38| 1076.94| 1075.16 | 2.86154|
| n016| 2| 737.28| 737.98| 737.63 | 3.68737|
| n017| 2| 712.33| 715.36| 713.84 | 3.73799|
Single computational thread used a total of 8568.438 CPU seconds over 3351 seconds of elapsed time
on 3 nodes
^ Array job started Mon 25 Feb 2013 08:46:29 PM EST ^^^^^^
^ Node ^^ Real Clock Time ^^^ Ratio ^
^ Name ^ Count ^ Min ^ Max ^ Average ^ User/Real ^
| n015| 2| 1654.88| 1662.45| 1658.66 | 0.99852|
| n016| 2| 1342.72| 1349.24| 1345.98 | 0.99821|
| n017| 2| 1286.60| 1287.27| 1286.93 | 0.99807|
**Predicted times for 200 tasks**
In the multiple computational threads case, all the cores are being used by one Matlab job. So for 200 jobs we can continue to run 3 at a time. The median time for one job is 737.63, so the expected elapsed time for 200 tasks.
Time 737.63 * 200 /3 = 49175.33 seconds > 13 hours
In the multiple computational threads case, only one of the cores is being used. So for 200 jobs we can remove the exclusive access and run 72 jobs at a time on the the 3 nodes. The median time for one job is 1345.98, so the expected elapsed time for 200 tasks.
Time = 1345.98 * 200 /72 = 3738.83 seconds ≈ 1 hr
=== SGE array job started Fri 22 Feb 2013 11:57:07 AM EST ===
Command to submit 200 jobs.
qsub maxEig.qs
All 200 jobs used a total of 680420.993 CPU seconds over 10916 seconds of elapsed time
on 3 nodes:
^ Node ^^ Real Clock Time ^^^ Ratio ^
^ Name ^ Count ^ Min ^ Max ^ Average ^ User/Real ^
| n015| 62| 2500.36| 4268.85| 3478.68 | 0.97593|
| n016| 70| 3042.86| 4010.68| 3504.29 | 0.98359|
| n017| 68| 2957.71| 3993.39| 3441.66 | 0.97704|
{{:clusters:mills:nostandby.png?640|}}
==== Compiled MATLAB standby queue ====
=== SGE array job started Fri 22 Feb 2013 04:05:28 PM EST ===
Command to submit 200 jobs.
qsub -l standby=1 maxEig.qs
All 200 jobs used a total of 613305.888 CPU seconds over 5388 seconds of elapsed time
on 9 nodes:
^ Node ^^ Real Clock Time ^^^ Ratio ^
^ Name ^ Count ^ Min ^ Max ^ Average ^ User/Real ^
| n073| 24| 2420.63| 3696.83| 3147.61 | 0.85789|
| n075| 24| 3231.81| 3762.60| 3545.73 | 0.97684|
| n092| 24| 3216.97| 3816.26| 3547.40 | 0.97047|
| n133| 24| 3706.22| 5385.36| 5015.40 | 0.58127|
| n161| 24| 3188.36| 3615.85| 3428.45 | 0.97021|
| n162| 24| 2941.85| 3475.67| 3210.88 | 0.98208|
| n167| 24| 3412.76| 4691.94| 4074.67 | 0.59991|
| n179| 24| 3025.63| 3964.92| 3540.22 | 0.97072|
| n187| 8| 1844.05| 2236.47| 2044.01 | 0.98612|
{{:clusters:mills:standby.png?640|}}