====== Matlab on Farber ======
For use on Farber, MATLAB projects should be developed using a Desktop installation of MATLAB and then copied to Farber
to be run in batch. Here an extended MATLAB example is considered involving one simple MATLAB function, and two MATLAB scripts to execute this function in a loop, and another to execute in parallel using the Parallel Toolbox.
Details on how to run these two scripts in batch are given with the resulting output files. There is also a
section with UNIX commands you can use to watch your jobs and gather [[#timings-and-core-count | timing and core count]] numbers.
It is important to know how much memory with be needed and how many cores will be used to set your resource requirements. If you do not ask for enough memory your job will fail. If you do not ask for enough cores, the job will take longer.
Even though it easier to develop on a desktop, MATLAB can be run interactively on Farber.
Two interactive jobs are demonstrated. One shows how to test the function by executing the function one time. A
second example shows an interactive session, which starts multiple MATLAB pool of workers to execute the function as a parallel toolbox loop, **''parfor''**. The Parallel toolbox gives a faster time to completion, but with more memory and CPU resources consumed.
You can run [[#desktop |MATLAB as a desktop (GUI)]] application on Farber, but is not recommended as the graphics is slow to display especially with a slower network connection.
Many MATLAB research projects fall in the the "high throughput computing" category. One run can be done on the desktop, but it is desired complete 100s or 1000s of independent runs. This greatly increases disk, memory and CPU requirements.
Thus we have a
final example that gives the recommended workflow to scale your job to multiple nodes. Compile the MATLAB code with single thread option and deploy the job as an grid engine array job.
The MATLAB distributed computing server (MDCS) is not installed on Farber. This means jobs run with the Parallel Computing toolbox can only run on one node. This limits both the size of the job and the number of workers you can use. That is why
an array job of compiled MATLAB is recommended for large jobs.
===== Matlab License Information for Grid Engine =====
Matlab licenses are pushed into consumable (global, per-job) integer complexes in Grid Engine and can be checked using
qhost -h global -F
to list number of unused license seats for each product.
Below is an example representing a snapshot of unused licensed seats for Matlab products on the cluster.
[traine@mills ~]$ qhost -h global -F
HOSTNAME ARCH NCPU LOAD MEMTOT MEMUSE SWAPTO SWAPUS
-------------------------------------------------------------------------------
global - - - - - - -
gc:MLM.Compiler=50.000000
gc:MLM.Aerospace_Blockset=1.000000
gc:MLM.RTW_Embedded_Coder=2.000000
gc:MLM.Robust_Toolbox=150.000000
gc:MLM.Aerospace_Toolbox=1.000000
gc:MLM.Identification_Toolbox=50.000000
gc:MLM.XPC_Target=2.000000
gc:MLM.Econometrics_Toolbox=1.000000
gc:MLM.Real-Time_Workshop=2.000000
gc:MLM.Fuzzy_Toolbox=50.000000
gc:MLM.Video_and_Image_Blockset=1.000000
gc:MLM.Neural_Network_Toolbox=50.000000
gc:MLM.Fin_Instruments_Toolbox=1.000000
gc:MLM.Optimization_Toolbox=44.000000
gc:MLM.MATLAB_Coder=2.000000
gc:MLM.MATLAB=204.000000
gc:MLM.Database_Toolbox=1.000000
gc:MLM.SIMULINK=100.000000
gc:MLM.PDE_Toolbox=48.000000
gc:MLM.GADS_Toolbox=1.000000
gc:MLM.Symbolic_Toolbox=46.000000
gc:MLM.Signal_Toolbox=146.000000
gc:MLM.Financial_Toolbox=1.000000
gc:MLM.Data_Acq_Toolbox=2.000000
gc:MLM.Image_Acquisition_Toolbox=1.000000
gc:MLM.Curve_Fitting_Toolbox=9.000000
gc:MLM.Image_Toolbox=143.000000
gc:MLM.Distrib_Computing_Toolbox=48.000000
gc:MLM.OPC_Toolbox=1.000000
gc:MLM.MPC_Toolbox=50.000000
gc:MLM.Virtual_Reality_Toolbox=1.000000
gc:MLM.Statistics_Toolbox=43.000000
gc:MLM.Signal_Blocks=50.000000
gc:MLM.Instr_Control_Toolbox=2.000000
gc:MLM.MAP_Toolbox=12.000000
gc:MLM.Communication_Toolbox=50.000000
gc:MLM.Control_Toolbox=150.000000
gc:MLM.Wavelet_Toolbox=1.000000
gc:MLM.Bioinformatics_Toolbox=1.000000
gc:MLM.Simulink_Control_Design=50.000000
gc:MLM.Real-Time_Win_Target=1.000000
Matlab jobs can be submitted to require a certain number of license seats to be available before a job will run. If there are inter-license dependencies for toolboxes, then you should specify all the licenses including Matlab and/or Simulink.
For example, if a Matlab job requires the Financial toolbox, then you will also need to specify all the inter-related toolbox licenses required by the Financial toolbox such as the Statistics and Optimization toolboxes as well Matlab itself. See [[http://www.mathworks.com/products/availability|Mathworks System Requirements & Platform Availability by Product
]] for complete details.
qsub -l MLM.MATLAB=1,MLM.Financial_Toolbox=1,MLM.Statistics_Toolbox=1,MLM.Optimization_Toolbox=1 ...
Naturally, this isn't a to-the-moment mapping because the license server is not being queried constantly. However, it's consumable, so it is keeping track of how many seats are unused every 6 minutes.
This will be most helpful when submitting many Matlab jobs that require a toolbox with a low-seat count. They will wait for a toolbox seat to become available rather than trying to run and having many getting the "**License checkout failed**" message from MATLAB.
===== Matlab function =====
We will using this sample function on the Farber cluster in multiple demonstrations
function maxe = maxEig(sd,dim)
% maxEig maximum real eigenvalue of a normally distributed random matrix
% Input parameters
% sd - seed for random generator
% dim - size of the square matrix
% Output value
% maxe - maximum real eigenvalue
if (isdeployed)
sd = str2num(sd)
dim = str2num(dim)
end
rng(sd);
ev = eig( randn(dim) );
maxe = max( ev(imag(ev)==0) )
end
The page will use this MATLAB function to illustrate using Matlab in batch and interactively. The function will be executed interactively on multiple cores using multiple computational threads, and with 12 workers from a MATLAB pool. A MATLAB script with be run in batch to loop with multiple computational threads, and again with MATLAB pool.
Finally it will be compiled and deployed using the Matlab Compiler Runtime (MCR) environment.
We want to select on the real eigenvalues to compute the maximum. The matrix is a full matrix of both positive and negative elements, so the eigenvalues with be both real and complex. The MATLAB has a function, ''isreal'', but it use useless to select real values from a comples array, since it will return false for all the elements of a complex array. Thus we use the selecting reals by the property that their imaginary part is 0.0. This may be subject to round-off errors, both by selecting complex numbers with very small imaginary parts, or by not selecting some real eigenvalues where the imaginary part is non-zero from rounding.
The last line of this function does not have a semicolon. Thus, the value is displayed with three lines of output, for every function call. This is not what you want, once you confident your code is producing good results. To make this function silent, just and a semicolon. To produced a more information, packed in to one line, you could add the fprintf function:
maxe = max(ev(imag(ev)==0));
fprintf('sd=%d counte=%d maxe=%.4f\n', sd, length(ev(imag(ev)==0)), maxe)
==== Matlab script ====
First, write a Matlab script file. It should have a comment on the first line describing the purpose of the script and have the ''quit'' command on the last line. This script will call the [[#matlab-function|maxEig function]] 200 times and report the average:
% script to run maxEig function 200 times and print average.
count = 200;
dim = 5001;
sumMaxe = 0;
tic;
for i=1:count;
sumMaxe = sumMaxe + maxEig(i,dim);
end;
toc
avgMaxEig = sumMaxe/count
quit
This is a detailed script example, which calls the maxEig function. This example does no file I/O, all the I/O is to standard out. In Matlab, assignments, not terminated by a semicolon, are display on the screen (standard out in batch).
Several MATLAB commands could be added to the beginning of this script to set the maximum number of computational threads to the number of slots assigned to your job. If the scheduler using CGROUPS to limit your job core count, then these commands are not necessary.
[compThreads,count]=sscanf(getenv('NSLOTS'),'%d');
if count == 1
warning('off','MATLAB:maxNumCompThreads:Deprecated');
autoCompThreads = maxNumCompThreads(compThreads);
disp(sprintf('NumCompThreads=%d, was %d',compThreads,autoCompThreads))
end
See [[maxNumCompThreadsGridEngine|Setting maximum number of computational threads]]
This script ends in a **__quit__** command (equivalent to MATLAB **__exit__**). This is meant to be a complete script, which
terminates MATLAB when done. If you run this from the bash command line with the ''-r script'' option, it will come back with a bash prompt when completed. If this is run from a batch job, then you can do other commands in your batch script after the MATLAB script completes.
Without the **__quit__** you will come back to the MATLAB prompt on completion for a interactive job. If this is the last line of a batch queue script, then the only difference will be the MATLAB prompt ''>>'' at the very end of the output file. MATLAB treats the end of batch job script file the same as exiting the window, which is the preferred way to exit the MATLAB GUI.
===== Copy the project folder =====
Copy the project folder to a directory on the cluster.
Use any [[:abstract:farber:transfer|file transfer client]] to copy your entire project directory.
====== Batch job======
You should have a copy of your MATLAB [[#project directory]] on the cluster.
**Versions of MATLAB**
MATLAB has a new version twice a year. It is important to keep the version you use on your desktop the same as the
one on the cluster. The command
vpkg_versions matlab
will show you the versions available on a cluster. Choose the one that matches the version on your desktop. We recommend you do not upgrade MATLAB in the middle of a project, unless there is a new feature or bug fix you need.
**Two directories**
It is frequently advisable to keep your MATLAB project clean from non-MATLAB files such as the queue
script file and the script output file. But you may combine them, and even use the MATLAB editor to
create the script file and look at the output file.
If you create the file on a PC, take care to not transfer the files as binary. See Transfer Files for the appropriate cluster.
When you have one combined directory, do not put the ''cd'' command in the queue script; instead, change
to the project directory using ''cd'' on the command line, before submitting your job.
===== Create a job script file =====
You should create a job script file to submit a batch job. Start by modifying a job template file (''/opt/shared/templates/gridengine''), for example, to submit a serial job on one core of a compute node, copy the serial template.
In your copy change the commented ''vpkg_require'' command to
require MATLAB, and then add your shell commands to the end of the file. Your copy may contain the lines:
# Add vpkg_require commands after this line:
vpkg_require matlab
# Now append all of your shell commands necessary to run your program
# after this line:
cd project_directory
matlab -nodisplay -singleCompThread -r main_script
The ''project_directory'' should have a file named ''main_script.m'' with your script. It could have just
one line **''display 'Hello World'''**.
===== Submit batch job =====
Your shell must be in a [[abstract:farber:app_dev:compute_env#using-workgroup-and-directories|workgroup environment]]
to submit any jobs.
Use the ''qsub'' command to submit a [[#batch-job|batch job]]
and note the ''<>'' that is assigned to your job. For example, if you queue script file name is ''matlab_first.qs'',
submit the job with:
qsub matlab_first.qs
**WARNING: Please choose a workgroup before submitting jobs**
This is the message you get if you are not in workgroup. Choose a workgroup with the ''workgroup'' command.
**Bash script vs queue script**
It is true that a queue script file is (usually) a bash script, but it must be executed with the ''qsub'' command instead of the ''sh'' command. This way the grid engine commands with be processed, and the job will be run on a compute node.
===== Wait for job to complete =====
You can [[abstract:farber:runjobs:job_status#checking-job-status|check on the status]] of you job with the ''qstat'' command.
For example, to list the information for job ''<>'', type
qstat -j <>
For long running jobs, you could change your queue script to notify you via an e-mail message when the job is
complete.
===== Post process job =====
All MATLAB output data files will be in the project directory, but the MATLAB standard output will be in
the current directory, from which you submitted the job. Look for a file ending in your assigned JOBID.
====== Interactive job ======
Here are specific details for running MATLAB as an interactive job on a compute node. You should have a copy of your [[#MATLAB project directory]] on the cluster and will be referred to a ''project_directory'' in the examples below.
===== Command-line =====
You should work on a compute node when in command-line MATLAB.
Your shell must be in a [[abstract:farber:app_dev:compute_env#using-workgroup-and-directories|workgroup environment]]
to submit a single threaded interactive job using ''qlogin''.
qlogin
vpkg_require matlab
cd project_directory
matlab -nodesktop -singleCompThread
This will start a interactive command-line session in your terminal window. When done type the ''quit'' or ''exit'' to terminated the MATLAB session and then ''exit'' to terminated the qlogin session.
===== Desktop =====
You should be on a compute node before you start MATLAB.
To start a MATLAB desktop (GUI mode) on a cluster, you must be running an X11 server and you must have
[[abstract:farber:system_access:system_access| connected using
X11 tunneling]].
Your shell must be in a [[abstract:farber:app_dev:compute_env#using-workgroup-and-directories|workgroup environment]]
to submit a job using ''qlogin''.
qlogin -l exclusive=1
vpkg_require matlab
cd project_directory
matlab
This will start a interactive DESKTOP session on you X11 screen. When done type the ''quit'' or ''exit'' in the command window or just close the window. When back at the terminal bash prompt, type ''exit'' to terminate the qlogin session.
See [[software:matlab:interactivetips|tips on starting Matlab]] in an interactive session without the desktop, including executing a script.
====== Compiling with MATLAB ======
We show the three most common ways to work with compilers when using MATLAB.
- Compiling your matlab code to run in the MCR (Matlab Compiler Runtime)
- Compiling your C or Fortran program to call MATLAB engine.
- Compiling your own function in C or Fortran to be used in a MATLAB session.
Make sure your compiler is newer than the one one required by your MATLAB version. In these examples MATLAB requires gcc 4.7 or newer. You may get the Warning:
Warning: You are using gcc version '4.9.3'. The version currently supported
with MEX is '4.7.x'. For a list of currently supported compilers see:
http://www.mathworks.com/support/compilers/current_release.
But the compilation completes successfully.
===== Compiling your MATLAB =====
There is an example MCR project in the ''/opt/shared/templates/'' directory for you to copy and try. Copy on the head node and qlogin to compile with MATLAB. Once your program is compiled you can run it interactively or in batch, without needing a MATLAB license.
==== Copy dev-projects template ====
On the head node
cp -r /opt/shared/templates/dev-projects/MCR .
cd MCR
==== Compile with make ====
Now compile on the compute node by using
qlogin
make
Remember you must be in a workgroup before using ''qlogin''
Resulting output from the make command:
Adding package `mcr/r2014b-nojvm` to your environment
make[1]: Entering directory `/home/work/it_css/traine/matlab/MCR'
mcc -o maxEig -R "-nojvm,-nodesktop,-singleCompThread" -mv maxEig.m
Compiler version: 5.2 (R2014b)
Dependency analysis by REQUIREMENTS.
Parsing file "/home/work/it_css/traine/matlab/MCR/maxEig.m"
(Referenced from: "Compiler Command Line").
Deleting 0 temporary MEX authorization files.
Generating file "/home/work/it_css/traine/matlab/MCR/readme.txt".
Generating file "run_maxEig.sh".
make[1]: Leaving directory `/home/work/it_css/traine/matlab/MCR'
Take note of the package added, and the files that are generated. You can remove these files, as they are not needed.
You must add the package in your batch script or to test interactively.
==== test interactively ====
To test interactively on the same compute node.
vpkg_require mcr/r2014b-nojvm
time ./maxEig 20.8
This example is designed as a test for batch computing, and takes about 15 minutes to complete. If you
change the MATLAB statement dim=10000 to dim=1000, and recompile, it will take about 10 seconds
==== back to the head node ====
When done, exit the compute node.
exit
==== Copy array job example ====
cp /opt/shared/templates/gridengine/matlab-mcr.qs .
vi matlab-mcr.qs
diff /opt/shared/templates/gridengine/matlab-mcr.qs matlab-mcr.qs
The ''diff'' output shows changes made in the ''vi'' session:
46c46
< # -l m_mem_free=5G
---
> #$ -l m_mem_free=3G
51c51
< # -t 1-4
---
> #$ -t 1-100
63c63,64
< vpkg_require mcr/r2014b-nojvm
---
> vpkg_require mcr/r2015a-nojvm
> let lambda="$SGE_TASK_ID-1"
79c80
< MCR_EXECUTABLE_FLAGS=("$RANDOM")
---
> MCR_EXECUTABLE_FLAGS=("$lambda")
To submit a standby array job that has 100 tasks.
qsub -l standby=1 matlab-mcr.qs
Example
[(it_css:traine)@farber MCR]$ qsub -l standby=1 matlab-mcr.qs
Your job-array 627074.1-100:1 ("matlab-mcr.qs") has been submitted
[(it_css:traine)@farber MCR]$ date
Mon Apr 11 14:56:26 EDT 2016
[(it_css:traine)@farber MCR]$ date
Mon Apr 11 15:17:33 EDT 2016
[(it_css:traine)@farber MCR]$ ls -l matlab-mcr.qs.o627074.* | wc -l
100
There are 100 output files with the names matlab-mcr.qs.o627074.1 to matlab-mcr.qs.o627074.100
For example file 50:
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 3221225472 bytes (vmem none bytes) on n106 (master)
[CGROUPS] with 1 core =
[CGROUPS] done.
Adding package `mcr/r2015a-nojvm` to your environment
GridEngine parameters:
MCR_ROOT = /opt/shared/matlab/r2015a
MCR executable = /home/work/it_css/traine/matlab/MCR/maxEig
flags = 49
MCR_CACHE_ROOT = /tmp/627074.50.standby.q
-- begin maxEig run --
maxe =
5.0243e+03
-- end maxEig run --
[[more examples]]
===== Compiling your code to use MATLAB engine ======
There is an simple example function ''**fengdemo.F**'' coded in Fortran, you can copy and use as a starting point.
On the head node and in a workgroup shell:
vpkg_require matlab/r2015a gcc/4.9
cp $MATLABROOT/extern/examples/eng_mat/fengdemo.F .
export LD_LIBRARY_PATH=$MATLABROOT/bin/glnxa64:$MATLABROOT/sys/os/glnx64:$LD_LIBRARY_PATH
mex -client engine fengdemo.F
To start MATLAB on a compute node to test this new program:
qlogin
vpkg_require matlab/r2015a gcc/4.9
export LD_LIBRARY_PATH=$MATLABROOT/bin/glnxa64:$MATLABROOT/sys/os/glnx64:$LD_LIBRARY_PATH
./fengdemo
exit
Step one of the fengdemo should give the plot:
{{:software:figure_1.png?400|}}
Step two should give the table:
MATLAB computed the following distances:
time(s) distance(m)
1.00 -4.90
2.00 -19.6
3.00 -44.1
4.00 -78.4
5.00 -123.
6.00 -176.
7.00 -240.
8.00 -314.
9.00 -397.
10.0 -490.
===== Compiling your own MATLAB function ======
There is an simple example function ''**timestwo.c**'', coded in c, you can copy and use as a starting point.
On the head node and in a workgroup shell:
vpkg_require matlab/r2015a gcc/4.9
cp $MATLABROOT/extern/examples/refbook/timestwo.c .
mex timestwo.c
To start MATLAB on a compute node to test this new function:
qlogin
vpkg_require matlab/r2015a gcc/4.9
matlab -nodesktop
timestwo(4)
quit
exit
You should get the answer
>> timestwo(4)
ans =
8
>>
====== Batch job serial example ======
Second, write a shell script file to set the Matlab environment and start Matlab running your script file. The following script file will set the Matlab environment and run the command in the [[#matlab-script|script.m]] file:
#$ -N script.m
#$ -m eas
#$ -M traine@gmail.com
#$ -l exclusive=1
vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r script
Ths ''-nodisplay'' indicates no X11 graphics, which implies ''-nosplash -nodesktop''. The ''-nojvm'' indicates no Java. (Java is needed for some functions, e.g., print graphics, but should be excluded for most computational jobs.)
The ''-r'' is followed by a Matlab command, enclosed in quotes when there is are spaces in the command.
**Exclusive access to node**:
The ''-l exclusive=1'' tells the scheduler to wait until your job can get exclusive access to the node. Since your job is the only job on the node, it can use all the memory and all the cores. Matlab assumes you want to use the full node to run as fast as possible. The goal is to reduce real time (wall clock time), not CPU time. When you use exclusive you should monitor the job to see the average core count and the maximum memory usage. With hind sight, this job should have used:
#$ -pe threads 5
#$ -l m_mem_free=1G
If everyone in your group carefully set these values, multiply jobs can run concurrently on the node.
See [[maxNumCompThreadsGridEngine|Setting maximum number of computational threads]]
**Errors in the Matlab script**:
The command ''script'' will execute the lines in the ''script.m'' file. For some errors Matlab will display the error message and wait for a response -- clearly not appropriate for a batch job. Consider replacing ''script'' with
the compound command
"try; script; catch err; disp(getReport(err,'extended')); quit; end"
The purpose of the **''try/catch''** block is to catch the first error in the script, and display a report. With the **''extended''** option the report will include a stack trace at the point of the error.
**Graphics in the Matlab script**
* Do not include the ''-nojvm'' on the **matlab** command.
* Do set paper dimensions and print each figure to a file.
The text output will be included in the standard Grid Engine output file, but not any graphics. All figures must be exported using the **print** command. Normally the **print** command will print on an 8 1/2 by 11 inch page with margins that are for a printed page of paper. The size and margins will not work if you plan to include the figure in a paper or a web page.
We suggest setting the current figure's ''PaperUnits'', ''PaperSize'' and ''PaperPosition''. Matlab provides a handle to the current figure (**gcf**). For example, the commands
set(gcf,'PaperUnits','inches','PaperSize',[4,3],'PaperPosition',[0 0 4 3]);
print('-dpng','-r100','maxe.png');
will set the current figure to be 4 x 3 inches with no margins, and then print the figure as a 400x300 resolution ''png'' file.
==== Submit job ====
Third, from the directory with both ''script.m'' and ''batch.qs'', submit the batch job with the command:
qsub batch.qs
You should specify required [[#matlab-license-information-for-grid-engine|Matlab licenses for GridEngine]] as a resource, especially if there are limited number of license seats available for particular toolboxes.
In this example you will only need a license for the base Matlab, and the parallel toolbox needs one license. We are using the default local scheduler which will give you workers on the same node with one license.
**Toolbox dependencies**
You should include toolbox dependencies in your batch script too to help avoid a failure, which will occur if the job starts with no [[matlab#license-information|licenses]] available.
For example, the Bioinformatics toolbox only has one seat, and in addition it requires the Statistics and Machine Learning toolbox, as well as the core MATLAB. So you would add the line:
#$ -l MLM.MATLAB=1,MLM.Statistics_Toolbox=1,MLM.Bioinformatics_Toolbox=1
to your job script.
==== Wait for completion ====
Finally, wait for the mail notification, which will be sent to ''traine@gmail.com''. When the job is done the output from the Matlab command will be in a file with the pattern - ''script.m.oJOBID'', where JOBID is the number assigned to your job.
After waiting for about 2 1/2 hours, a message was receive with subject line "Grid Engine Job Scheduler":
Job 2362 (script.m) Complete
User = traine
Queue = it_css.q@n038
Host = n038.farber.hpc.udel.edu
Start Time = 10/21/2014 14:45:42.100
End Time = 10/21/2014 17:09:24.782
User Time = 12:41:56
System Time = 00:11:31
Wallclock Time = 02:23:42
CPU = 12:53:27
Max vmem = 3.924G
Exit Status = 0
==== Gather results ====
The results for Job 2362 are in the file
[CGROUPS] No /cgroup/memory/UGE/2362.1 exists for this job
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting none bytes (vmem none bytes) on n038 (master)
[CGROUPS] with 20 cores = 0-19
[CGROUPS] done.
Adding package `matlab/r2014b` to your environment
< M A T L A B (R) >
Copyright 1984-2014 The MathWorks, Inc.
R2014b (8.4.0.150421) 64-bit (glnxa64)
September 15, 2014
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
maxe =
70.0220
... //Skipping 198 similar displays of variable maxe//
maxe =
67.4221
Elapsed time is 8618.393954 seconds.
avgMaxEig =
69.5131
==== Timings and core count ====
Consider a batch job run with the two Grid Engine options:
-pe threads 5
-l m_mem_free=1G
The ''qsub'' command will give you the job id, and once it starts running, the ''qstat'' command will give you the node you are running on - ''n=n038''. After about 10 minutes of running:
$ ssh $n ps -eo pid,ruser,pcpu,pmem,thcount,stime,time,command | egrep '(COMMAND|matlab)'
PID RUSER %CPU %MEM THCNT STIME TIME COMMAND
29207 traine 180 0.8 10 11:00 00:09:40 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
This ''ps'' command will give the percent CPU, which is ''>100%'' for multi-core jobs, the percent memory, the thread count, which is > 5, the start time, the time of executions, and finally the full command used the start the job.
Given the reported PID, 29207, you can drill down and see which of the 10 threads are consuming CPU time:
$ ssh $n ps -eLf | egrep '(PID|2907)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
traine 29207 29076 29257 99 10 11:00 ? 00:06:55 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 29207 29076 29264 22 10 11:00 ? 00:01:31 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 29207 29076 29265 22 10 11:00 ? 00:01:33 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 29207 29076 29266 22 10 11:00 ? 00:01:31 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 29207 29076 29267 22 10 11:00 ? 00:01:32 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
While the batch job was running on node ''n=n038'', the top command was run to sample the resources being used by Matlab
every second two times ''-b -n 1''. This ''-H'' option was used to display each individual threads, rather than a summery of all threads in a process.
$ ssh $n top -H -b -n 1 | egrep '(COMMAND|MATLAB)' | grep -v 'S 0'
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
29257 traine 20 0 1698m 577m 73m R 99.5 0.9 112:27.82 MATLAB
29266 traine 20 0 1698m 577m 73m S 7.8 0.9 30:24.10 MATLAB
29264 traine 20 0 1698m 577m 73m S 5.9 0.9 30:24.49 MATLAB
29265 traine 20 0 1698m 577m 73m S 5.9 0.9 30:24.63 MATLAB
29267 traine 20 0 1698m 577m 73m S 5.9 0.9 30:27.43 MATLAB
29263 traine 20 0 1698m 577m 73m S 2.0 0.9 1:15.25 MATLAB
using the the PID of
$ ssh $n mpstat -P ALL 1 2
Linux 2.6.32-431.23.3.el6.x86_64 (n038) 04/28/2015 _x86_64_ (20 CPU)
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
Average: all 7.06 0.00 0.18 0.00 0.00 0.00 0.00 0.00 92.77
Average: 0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: 1 10.66 0.00 0.00 0.00 0.00 0.00 0.00 0.00 89.34
Average: 2 10.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 90.00
Average: 3 9.55 0.00 0.50 0.00 0.00 0.00 0.00 0.00 89.95
Average: 4 10.45 0.00 0.00 0.00 0.00 0.00 0.00 0.00 89.55
Average: 5 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 6 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 7 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50
Average: 8 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 9 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 10 0.00 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.50
Average: 11 0.50 0.00 2.00 0.00 0.00 0.00 0.00 0.00 97.50
Average: 12 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 13 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 15 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 19 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
qhost -h $n
HOSTNAME ARCH NCPU NSOC NCOR NTHR NLOAD MEMTOT MEMUSE SWAPTO SWAPUS
----------------------------------------------------------------------------------------------
global - - - - - - - - - -
n038 lx-amd64 20 2 20 20 0.10 63.0G 11.4G 2.0G 18.3M
After the job is done you can use ''qacct'' to get a recap of resources used:
$ qacct -h n038 -j 64501 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 9088.920
ru_maxrss 591764
cpu 18986.828
maxvmem 1.673G
====== Batch parallel example ======
The Matlab parallel toolbox uses JVM to manage the workers and communicate while you are running. You
need to setup the Matlab pools in your ''script''.
==== Matlab parallel script ====
Here are the slightly modified MATLAB script.
Add two ''parpool'' commands and change ''for'' ⇒ ''parfor''.
% script to run maxEig function 200 times
mypool=parpool(20);
count = 200;
dim = 5001;
sumMaxe = 0;
tic;
parfor i=1:count;
sumMaxe = sumMaxe + maxEig(i,dim);
end;
toc
avgMaxEig = sumMaxe/count
delete(mypool);
exit
==== Grid engine parallel script ====
Take out ''-nojvm'', which is needed for the parpool, and require the distributed computing toolbox.
#$ -N pscript
#$ -m eas
#$ -M traine@gmail.com
#$ -l m_mem_free=3.1G
#$ -l MLM.Distrib_Computing_Toolbox=1
#$ -pe threads 20
vpkg_require matlab/r2015a
matlab -nodisplay -r pscript
==== Timing results ====
Reported usage for same job run using the parallel toolbox.
JJob 618746 (pscript) Complete
User = traine
Queue = spillover.q@n010
Host = n010.farber.hpc.udel.edu
Start Time = 03/31/2016 11:01:53.776
End Time = 03/31/2016 11:21:28.937
User Time = 06:02:34
System Time = 00:01:00
Wallclock Time = 00:19:35
CPU = 06:03:35
Max vmem = 80.513G
Exit Status = 0
Compare script vs pscript
^ Job ^ Wallclock Time ^ CPU ^ Max vmem ^
| script | 02:23:42 | 12:53:27 | 3.924G |
| pscript | 00:19:35 | 04:28:57 | 80.513G |
The job **script** used more CPU resources with the multiple computational threads, while **pscript** user more memory resources with 20 single-threaded worker.
====== Interactive example ======
The basic steps to running a [[:software:matlab#interactive-job|MATLAB]] interactively on a compute node.
This demo starts in your MATLAB directory and with and active workgroup.
==== Scheduling exclusive interactive job ====
$ qlogin -l exclusive=1
Your job 2493 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 2493 has been successfully scheduled.
Establishing /opt/shared/univa/local/qlogin_ssh session to host n036 ...
==== Starting a command mode matlab session ====
$ vpkg_require matlab/r2014b
Adding package `matlab/r2014b` to your environment
$ matlab -nodesktop -nosplash
MATLAB is selecting SOFTWARE OPENGL rendering.
< M A T L A B (R) >
Copyright 1984-2014 The MathWorks, Inc.
R2014b (8.4.0.150421) 64-bit (glnxa64)
September 15, 2014
To get started, type one of these: helpwin, helpdesk, or demo.
For product information, visit www.mathworks.com.
==== Using help as the first command ====
>> help maxEig
maxEig Maximum Eigenvalue of a random matrix
Input parameters
sd - seed for uniform random generator
dim - size of the square matrix (should be odd)
Output value
maxe - maximum real eigvalue
==== Calling function once ====
Use the tic and toc commands to report the elapsed time to generate the random matrix, find all eigenvalues and report the maximum real eigenvalue.
>> tic; maxEig(1,5001); toc
maxe =
70.0220
Elapsed time is 54.781289 seconds.
==== Finishing up ====
>> exit
$ exit
Connection to n036 closed.
/opt/shared/univa/local/qlogin_ssh exited with exit code 0
===== Interactive parallel toolbox example =====
When you plan to use the parallel toolbox, you should logon exclusively to a compute node with the command:
qlogin -l exclusive=1
This will effectively reserve the entire node for your MATLAB workers. The is default number of parallel workers is 12, but you can ask for more -- up to the number of cores on the node when using the local scheduler.
Here we start 20 workers with the parpool function, and then use parfor to send a different seed to each worker. The output is from the workers, as they complete, but the order is not deterministic.
**Make sure the workers are not doing exactly the same computations** In this example, the different seed, passed to the function, causes all the random values to be different on each worker.
It took about 100 seconds for all 20 workers to produce on result. Since they are working in parallel the elapsed time to complete 200 results is about
>> parpool(20);
Starting parallel pool (parpool) using the 'local' profile ... connected to 20 workers.
>> tic; parfor sd = 1:200; maxEig(sd,5001); end; toc
maxe =
70.2345
maxe =
69.9007
maxe =
71.2040
skipped lines
maxe =
70.1443
maxe =
71.2327
maxe =
66.3099
Elapsed time is 1087.729851 seconds.
====== MCR array job example ======
Most Matlab functions can be compiled using the Matlab Compiler (MCC) and then deployed to run on the compute nodes in the MATLAB Compiler Runtime (MCR). The MCR is a prerequisite for deployment, and is installed on all the compute nodes. You must use VALET to set up the libraries you will need to run your function from the command line. You do not need to use the shell (''.sh'' file) that the compiler creates.
There are two ways to run compiled MATLAB jobs in a shared environment, such as Mills and Farber.
- Compile to produce and executable that uses a single computational thread - MATLAB option '-singleCompThread'
- Submit the job to use the nodes exclusively - Grid engine option ''-l exclusive=1''
You can run more jobs on each node when they compiled to use just one core (Single Comp Thread). This will give
you higher throughput for an array job, but not higher performance.
==== Example compiler commands ====
The [[#matlab-function|maxEig function]] has a conditional statement to make it work when deployed.
if (isdeployed)
sd = str2num(sd)
dim = str2num(dim)
end
All augments of the function are taken as tokens on the shell command used to execute the script,
and they are all strings. You must convert numbers from strings to numbers. You can use the same variable names so
that the rest of the script will behave the same when deployed or executed directly in Matlab.
You can convert this function into a single computational executable by using the Matlab compiler ''mcc''. Type the commands
prog=maxEig
opt='-nojvm,-nodisplay,-singleCompThread'
version='r2015a'
vpkg_require matlab/$version
mcc -R "$opt" -mv $prog.m
[ -d $WORKDIR/sw/bin ] && mv $prog $WORKDIR/sw/bin
**Keep these commands in a file**: Even though this is just two commands, we recommend you keep these commands, including the shell assignment statements, as a record of the MATLAB version and options you used to create the executable ''maxEig''. You will need to know these if you want to use the executable in a shell script. You can source this file when you want to rebuild ''maxEig''
**You can get mcc usage instructions with ''mcc -help''**: The string following the ''-R'' flag are the Matlab
options you want to use at run time. The ''-m'' option tell mcc to build a standalone application to be deployed
using MCR. The ''-v'' option is for verbose mode.
You cannot execute a file from a directory on the ''lustre'' file system. That is why the executable **''$prog''** is moved to the special directory, which is added to your path when a new workgroup shell is started or when a queue script is submitted.
[ -d $WORKDIR/sw/bin ] && mv $prog $WORKDIR/sw/bin
==== Compiling commands ====
[(it_css:traine)@farber matlabApr1]$ qlogin
Your job 619145 ("QLOGIN") has been submitted
waiting for interactive job to be scheduled ...
Your interactive job 619145 has been successfully scheduled.
Establishing /opt/shared/univa/local/qlogin_ssh session to host n039 ...
[(it_css:traine)@n039 matlabApr1]$ . compile.sh
Adding package `matlab/r2016a` to your environment
Compiler version: 6.2 (R2016a)
Dependency analysis by REQUIREMENTS.
Parsing file "/home/work/it_css/traine/matlabApr1/maxEig.m"
(Referenced from: "Compiler Command Line").
Deleting 0 temporary MEX authorization files.
Generating file "/home/work/it_css/traine/matlabApr1/readme.txt".
Generating file "run_maxEig.sh".
[(it_css:traine)@n039 matlabApr1]$ ls
compile.sh mccExcludedFiles.log run_maxEig.sh
maxEig readme.txt script.m
maxEig.m requiredMCRProducts.txt stackTrace.m
[(it_css:traine)@n039 matlabApr1]$ exit
exit
Connection to n039 closed.
/opt/shared/univa/local/qlogin_ssh exited with exit code 0
==== Example queue script file ====
The ''mcc'' command will generate a ''.sh'' file, which you can use to setup your environment and run the command. This does not use VALET and does not have any grid engine commands in it. We suggest you the gridengine template in the file
/opt/shared/templates/gridengine/matlab-mcr.qs
or modify this simple example:
#$ -N maxEig
#$ -t 1-200
#$ -l m_mem_free=3.1G
#
# Parameter sweep array job to run the maxEig compiled MATLAB function with
# lambda = 1,2. ... 200
#
date "+Start %s"
echo "Host $HOSTNAME"
vpkg_require mcr/r2014b-nojvm
export MCR_CACHE_ROOT="$TMPDIR"
let seed=$SGE_TASK_ID
let dim=5001
./maxEig $seed $dim
date "+Finish %s"
The two ''date'' commands record the start and finish time in seconds for each task. These can be used to compute the elapsed time, and the echoed host name can be used to calculate the overlapping use of the computer nodes. Since ''maxEig'' was compiled as a single threaded job the elapse time will be very close to the wall clock time and CPU time. We do not send email notification since it would generated 200 email messages, one for each task.
==== Compiled Matlab in owner queues ====
To test the example compiled Matlab job on the ''it_css'' owner queues, we first compiled the code with mcc and
then submited with qsub. The job number assigned 3731. After a few minutes 200 files were created in the current directory.
maxEig.o3731.1 ... maxEig.o3731.200
They each had the output of one task. For example for taskid 125:
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 5368709120 bytes (vmem 5368709120 bytes) on n036 (master)
[CGROUPS] with 1 core = 5
[CGROUPS] done.
Start 1414171807
Host n036
Adding package `mcr/r2014b-nojvm` to your environment
sd =
125
dim =
5001
maxe =
70.4891
Finish 1414171902
Now we gather all the information from this files and write a data file with three columns:
sd dim maxe
1 5001 70.0220
2 5001 71.7546
3 5001 70.8331
4 5001 70.5714
....
199 5001 70.7535
200 5001 67.4221
and prints the average
avgMaxEig = 69.5131125
These are the same results we got from both the matlab loop and the parallel toolbox, but they where computed
in just over 3 1/2 minutes. To see this we gather the start/finish times in seconds and the host name.
=== SGE array job started Fri 24 Oct 2014 01:28:37 PM EDT ===
Used a total of 18977 CPU seconds over 219 seconds of elapsed time
on 10 nodes
^ Node ^^ Real Clock Time ^^^ Ratio ^
^ Name ^ Count ^ Min ^ Max ^ Average ^ User/Real ^
| n036| 24| 80.00| 99.00| 88.96 | 1.00000|
| n038| 24| 77.00| 110.00| 94.21 | 1.00000|
| n040| 24| 78.00| 108.00| 93.21 | 1.00000|
| n084| 24| 75.00| 114.00| 95.21 | 1.00000|
| n085| 19| 73.00| 127.00| 99.58 | 1.00000|
| n086| 19| 70.00| 125.00| 98.26 | 1.00000|
| n089| 8| 76.00| 78.00| 77.00 | 1.00000|
| n090| 24| 78.00| 115.00| 95.75 | 1.00000|
| n092| 19| 74.00| 123.00| 95.74 | 1.00000|
|test-gpu| 15| 74.00| 126.00| 104.47 | 1.00000|
Using gnuplot we get a time chart of usage on the 10 nodes and total CPU usage.
{{:clusters:farber:wiki3731.png?640|}}
==== Compiled Matlab in standby queue ====
Command to submit 200 jobs to the standby queue (must complete in 8 hours.)
qsub -l standby=1 abatch.qs
=== SGE array job started Wed 22 Oct 2014 01:16:58 PM EDT ===
Used a total of 19856 CPU seconds over 127 seconds of elapsed time on 17 nodes
^ Node ^^ Real Clock Time ^^^ Ratio ^
^ Name ^ Count ^ Min ^ Max ^ Average ^ User/Real ^
| n000| 12| 86.00| 101.00| 93.33 | 1.00000|
| n003| 12| 84.00| 104.00| 93.75 | 1.00000|
| n004| 12| 86.00| 98.00| 90.42 | 1.00000|
| n019| 12| 81.00| 109.00| 96.25 | 1.00000|
| n022| 12| 83.00| 126.00| 116.42 | 1.00000|
| n031| 12| 86.00| 97.00| 90.25 | 1.00000|
| n032| 12| 76.00| 126.00| 111.75 | 1.00000|
| n036| 12| 88.00| 125.00| 109.08 | 1.00000|
| n045| 12| 76.00| 125.00| 110.50 | 1.00000|
| n051| 8| 77.00| 90.00| 83.12 | 1.00000|
| n064| 12| 85.00| 116.00| 99.75 | 1.00000|
| n073| 12| 83.00| 125.00| 117.00 | 1.00000|
| n074| 12| 88.00| 99.00| 91.58 | 1.00000|
| n077| 12| 86.00| 100.00| 93.50 | 1.00000|
| n080| 12| 85.00| 111.00| 102.17 | 1.00000|
| n083| 12| 85.00| 101.00| 93.42 | 1.00000|
| n088| 12| 86.00| 98.00| 90.08 | 1.00000|
{{:clusters:farber:wiki2482.png?640|}}