software:matlab:darwin

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
software:matlab:darwin [2023-10-06 09:44] – [Running the checkpoint job and its output] thuachensoftware:matlab:darwin [2023-10-09 09:07] (current) – [Matlab on DARWIN] thuachen
Line 1: Line 1:
 +====== Matlab on DARWIN ======
 +</note>
 +For use on DARWIN, MATLAB projects should be developed using a Desktop installation of MATLAB and then copied to DARWIN
 +to be run in batch.  Here, an extended MATLAB example involves one simple MATLAB function, two MATLAB scripts to execute
 +this function in a loop, and another to execute in parallel using the Parallel Computing Toolbox.
  
 +Even though it is easier to develop on a desktop, MATLAB can be run interactively on DARWIN. However, it is not recommended for scripts that are long and computationally intensive.
 +Two interactive jobs are demonstrated.  One shows how to test the function by executing the function one time. A
 +the second example shows an interactive session, which starts multiple MATLAB pool of workers to execute the function in a loop using the Parallel Computing toolbox command, **''parfor''**.
 +The Parallel Computing toolbox gives a faster time to completion, but more memory and CPU resources are consumed.
 +
 +You can run [[:software:matlab:DARWIN#Desktop |MATLAB as a Desktop GUI]] application on DARWIN, but again this is not recommended as the graphics are slow to display especially with a lower bandwidth network connection. 
 +
 +Many MATLAB research projects fall in the "high throughput computing" category.  One run can be done on the desktop, but it is desired to complete 100s or 1000s of independent runs.  This greatly increases disk, memory and CPU requirements. 
 +Thus we have an example that gives the recommended workflow to scale your job to multiple nodes. Compile the MATLAB code with single thread option and deploy the job as an grid engine array job.
 +
 +<note important>
 +The MATLAB Distributed Computing Server (MDCS) now referred to as MATLAB Parallel Server is not installed on DARWIN. This means jobs run with the Parallel Computing toolbox can only run on one node.  This limits both the size of the job and the number of workers you can use.  That is why
 +an array job of compiled MATLAB code is recommended for large jobs.
 +</note>
 +
 +
 +
 +====== Getting Started ======
 +There will be several examples covered in the following sections. To help make things easier to follow it is suggested to make a new directory in your home directory '' ~/ '' or in your workgroup directory '' $WORKDIR ''. Then use ''cd'' in the directory. In the new directory, you can add the [[#matlab-function|maxEig.m]] and [[#matlab-script|script.m]] files. These two files will be used in several of the examples.
 +<code>
 +[traine@login00 ~]$ mkdir matlab_example
 +[traine@login00 ~]$ cd matlab_example
 +</code>
 +<note tip>**Example Directories**
 +
 +As you go through the following example, it is suggested that you also create a new directory for each of them. It will help make it easier to follow and track the output files of the different jobs that you will be running. 
 +</note>
 +
 +Now create the following file and put it in the ''~/matlab_example'' directory
 +===== Matlab function =====
 +
 +We will be using this sample function on the DARWIN cluster in multiple demonstrations.
 +<file matlab maxEig.m>
 +function maxe = maxEig(sd,dim)
 +%  maxEig  maximum real eigenvalue of a normally distributed random matrix
 +%     Input parameters
 +%       sd - seed for random generator
 +%       dim - size of the square matrix
 +%     Output value
 +%       maxe - maximum real eigenvalue
 +  if (isdeployed)
 +    sd = str2num(sd)
 +    dim = str2num(dim)
 +  end
 +
 +  rng(sd);
 +  ev = eig( randn(dim) );
 +  maxe = max( ev(imag(ev)==0) )
 +end
 +</file>
 +
 +
 +The remainder of this section is based on using this MATLAB function to illustrate the usage of MATLAB interactively and batch.  The function will be executed interactively on multiple cores using multiple computational threads, and with 12 workers from a MATLAB pool. A MATLAB script with be run in batch to loop with multiple computational threads again using a MATLAB pool.
 +
 +Finally it will be compiled and deployed using the MATLAB Compiler Runtime (MCR) environment.
 +
 +<note important>
 +We want to select on the real eigenvalues to compute the maximum. The matrix is a full matrix of both positive and negative elements, so the eigenvalues will be both real and complex.  MATLAB has a function, ''isreal'', but it is useless to select real values from a complex array, since it will return false for all the elements of a complex array.  Thus we use the selecting reals by the property that their imaginary part is 0.0.  This may be subject to round-off errors, both by selecting complex numbers with very small imaginary parts or by not selecting some real eigenvalues where the imaginary part is non-zero from rounding.
 +</note>
 +
 +<note tip>The last line of this function does not have a semicolon.  Thus, the value is displayed with three lines of output, for every function call.  This is probably not what you want, once you are confident your code is producing good results.  To make this function silent, just add a semicolon.  To produce more information, packed in to one line, you could add the fprintf function:
 +<code matlab>
 +  maxe = max( ev(imag(ev)==0) );
 +  fprintf('sd=%d counte=%d maxe=%.4f\n', sd, length(ev(imag(ev)==0)), maxe)
 +</code>
 +</note>
 +
 +
 +==== Matlab script ====
 +Now, write a MATLAB script file and put it in the ''~/matlab_example'' directory. It should have a comment on the first line describing the purpose of the script and have the ''quit'' command on the last line. This script will call the [[#matlab-function|maxEig function]] 200 times and report the average:
 +
 +<file matlab script.m>
 +% script to run maxEig function 200 times and print average.
 +
 +count = 200;
 +dim = 5001;
 +sumMaxe = 0;
 +tic;
 +for i=1:count;
 +  sumMaxe = sumMaxe + maxEig(i,dim);
 +end;
 +toc
 +avgMaxEig = sumMaxe/count
 +
 +quit
 +</file>
 +
 +This is a detailed script example, which calls the ''maxEig'' function.  This example does no file I/O, and all the I/O is to standard out.  In MATLAB, assignments, not terminated by a semicolon, are displayed on the screen (standard out in batch).
 +
 +<note tip>
 +This script ends in a **__quit__** command (equivalent to MATLAB **__exit__**).  This is meant to be a complete script, which
 +terminates MATLAB when done.  If you run this from the bash command line (interactively) with the ''-r script'' option, it will come back with a bash prompt when completed.  If this is run from a batch job, then you can do other commands in your batch script after the MATLAB script completes.
 +
 +Without the **__quit__** you will come back to the MATLAB prompt on completion for an interactive job.  If this is the last line of a batch queue script, then the only difference will be the MATLAB prompt ''>>'' at the very end of the output file.  MATLAB treats the end of batch script file the same as exiting the window, which is the preferred way to exit the MATLAB GUI.
 +</note>
 +===== Copy the project folder =====
 +
 +If you created the files on your desktop version of MATLAB, now copy the folder to your ''~/matlab_example'' project directory on the cluster.
 +Use any [[:abstract:caviness:transfer|file transfer client]] to copy your project directory.
 +====== Batch Job ======
 +You should have a copy of your MATLAB project directory on the cluster.
 +
 +<note important>**Versions of MATLAB**
 +
 +MATLAB has a new version twice a year.  It is important to keep the version you use on your desktop the same as the
 +one on the cluster.  The command
 +<code>
 +vpkg_versions matlab
 +</code>
 +will show you the versions available on a cluster.  Choose the one that matches the version on your desktop.  We recommend you do not upgrade MATLAB in the middle of a project, unless there is a new feature or bug fix you need.
 +</note>
 +
 +<note tip>**Two directories**
 +
 +It is frequently advisable to keep your MATLAB project clean from non-MATLAB files such as the job 
 +script file and the script output file.  But you may combine them, and even use the MATLAB editor to 
 +create the script file and look at the output file.  
 +If you create the file on a Windows desktop, take care not to transfer the files as binary. See [[abstract:darwin:transfer|Transferring Files to/from DARWIN]] for details.
 +
 +When you have one combined directory, do not put the ''cd'' command in the queue script; instead, change
 +to the project directory using ''cd'' on the command line, before submitting your job.
 +</note>
 +===== Create a job script file =====
 +You should create a job script file to submit a batch job. Start by modifying a batch job script template file (''/opt/shared/templates/slurm/generic/serial.qs''), for example, to submit a serial job using one core on a compute node, 
 +In your newly copied serial.qs file, add the following lines at the end.
 +<code> 
 +[traine@login00 matlab_example]$ cp /opt/shared/templates/slurm/generic/serial.qs matlab_first.qs
 +</code>
 +<code>
 +# Add vpkg_require commands after this line:
 +vpkg_require matlab
 +#Running the Matlab main_script
 +matlab -nodisplay -singleCompThread -batch main_script
 +</code>
 +Note we did not specify a version of MATLAB with the VALET command, so we will get the default version (''*'') defined in VALET.  This is okay for our examples, but in practice and reproducibility of your jobs, you should specify a MATLAB version. Now make a new file called ''main_script.m'' Add the below lines to the file.
 +<code>
 +display 'Hello World'
 +</code>
 +When this script runs it will display '' Hello World ''.
 +
 +===== Submit batch job =====
 +Your shell must be in a [[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup environment]]
 +to submit any jobs.
 +Use the ''sbatch'' command to submit a [[#batch-job|batch job]]
 +and note the ''<<JOBID>>'' that is assigned to your job.  For example, if your job script file name is ''matlab_first.qs'', then to
 +submit the job you would type
 +<code>
 +sbatch matlab_first.qs
 +</code>
 +
 +<note important>**WARNING:  Please choose a workgroup before submitting jobs**
 +
 +This is the message you get if you are not in a workgroup.  
 +
 +     sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified
 +</note>
 +
 +<note warning>
 +
 +**Bash script vs job script**
 +
 +It is true that a job script file is (usually) a bash script, but it must be executed with the ''sbatch'' command instead of the ''sh'' command.  This way it is process by the job scheduler, Slurm, and the appropriate Slurm commands will allocate the requested resources and the job will be run on a compute node.
 +</note>
 +===== Wait for job to complete =====
 +You can [[abstract:darwin:runjobs:job_status#checking-job-status|check on the status]] of your job with the ''scontrol show job'' command. 
 +For example, to list the information for the job ''<<JOBID>>'', type
 +<code>
 +scontrol show job <<JOBID>>
 +</code>
 +To get the information from the past job, use ''sacct'' command.
 +For long-running jobs, you could change your job script to notify you via an email message when the job is 
 +complete.
 +
 +
 +===== Post process job =====
 +All MATLAB output data files will be in the project directory, but the MATLAB standard output will be in 
 +the current directory, from which you submitted the job.  If you did not redefine Slurm output for your job, then you'll be looking for a file ''slurm-<<JOBID>>.out''.
 +
 +====== Interactive job ======
 +
 +Here are specific details for running MATLAB as an interactive job on a compute node.  You should have a copy of your MATLAB project directory on the cluster which will be referred to as a ''project_directory'' in the examples below.
 +
 +===== Command-line =====
 +
 +You should work on a compute node when in command-line MATLAB.
 +Your shell must be in a [[abstract:darwin:app_dev:compute_env#using-workgroup-and-directories|workgroup environment]]
 +to submit a single-threaded interactive job using ''salloc''.
 +
 +<code>
 +
 +[traine@login00 ~]$ workgroup -g it_css
 +[(it_css:traine)@login00 ~]$ salloc --partition=standard
 +salloc: Pending job allocation 7809686
 +salloc: job 7809686 queued and waiting for resources
 +salloc: job 7809686 has been allocated resources
 +salloc: Granted job allocation 7809686
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n00 are ready for job
 +[traine@r1n00 ~]$ vpkg_require matlab
 +Adding package `matlab/2020b` to your environment
 +[traine@r1n00 ~]$ cd matlab_example
 +[traine@r1n00 matlab_example]$ matlab -nodesktop -singleCompThread
 +MATLAB is selecting SOFTWARE OPENGL rendering.
 +
 +                                                   < M A T L A B (R) >
 +                                         Copyright 1984-2018 The MathWorks, Inc.
 +                                          R2020b (9.9.0.1467703) 64-bit (glnxa64)
 +                                                   August 26, 2020
 +
 +
 +To get started, type doc.
 +For product information, visit www.mathworks.com.
 +>>
 +</code>
 +
 +This will start an interactive command-line session in your terminal window.  When done, type the ''quit'' or ''exit'' to terminate the MATLAB session and then ''exit'' to terminate the salloc session. Again note, a specific version of MATLAB was not specified, so at the time of writing this wiki page the default version defined in VALET was version 2020b.
 +<code>
 +MATLAB is selecting SOFTWARE OPENGL rendering.
 +
 +                                                   < M A T L A B (R) >
 +                                          Copyright 1984-2018 The MathWorks, Inc.
 +                                          R2020b (9.9.0.1467703) 64-bit (glnxa64)
 +                                                  August 26, 2020
 +
 +
 +To get started, type doc.
 +For product information, visit www.mathworks.com.
 +>>quit
 +[traine@r1n00 matlab_example]$ exit
 +exit
 +salloc: Relinquishing job allocation 7809686
 +[(it_css:traine)@login01 ~]$
 +
 +</code>
 +
 +===== Desktop =====
 +You should be on a compute node before you start MATLAB. To start a MATLAB desktop (GUI mode) on a cluster, you must be running an X11 server and you must have connected to the cluster with ''ssh'' using X11 tunneling.
 +
 +You must be in a workgroup environment to submit a job using ''salloc''.
 +
 +<code bash>
 +[traine@login00 ~]$ workgroup -g it_css
 +[(it_css:traine)@login00 ~]$ salloc --x11 -N1 -n1 --partition=standard
 +salloc: Pending job allocation 7790913
 +salloc: job 7790913 queued and waiting for resources
 +salloc: job 7790913 has been allocated resources
 +salloc: Granted job allocation 7790913
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n00 are ready for job
 +[traine@r00n10 ~]$ vpkg_require matlab
 +Adding package `matlab/2020b` to your environment
 +[traine@r00n10 ~]$ matlab
 +MATLAB is selecting SOFTWARE OPENGL rendering.
 +
 +</code>
 +
 +This will start an interactive MATLAB desktop GUI mode session on your desktop in an X11 window using your workgroup resources.
 +
 +{{ :software:matlab:matlab-xwin.jpg?600 |}}
 +
 +When done type the ''quit'' or ''exit'' in the command window or close the window. When back at the terminal bash prompt, type ''exit'' to terminate the ''salloc'' interactive session and return to the login (head) node.
 +
 +See [[software:matlab:interactivetips|tips on starting MATLAB]] in an interactive session without the desktop, including executing a script. 
 +
 +For more information, review the instructions for [[http://www1.udel.edu/it/research/training/config_laptop/|setting up X11 connections with an SSH connection]] for Windows, Mac, and Linux OS. 
 +
 +For more information on GUI Applications on DARWIN, visit [[abstract:darwin:runjobs:schedule_jobs#launching-gui-applications-x11-forwarding|Launching GUI Applications (X11 Forwarding)]].
 +
 +====== Compiling with Matlab ======
 +
 +We show the three most common ways to work with compilers when using MATLAB.
 +
 +  - Compiling your MATLAB code to run in the MCR (MATLAB Compiler Runtime)
 +  - Compiling your C or Fortran program to call MATLAB engine.
 +  - Compiling your own function in C or Fortran to be used in a MATLAB session.
 +
 +<note>Make sure your compiler is newer than the one required by your MATLAB version. In these examples, MATLAB requires gcc 4.7 or newer. You may get the Warning:
 +<code>
 +Warning: You are using gcc version '4.9.3'. The version currently supported 
 +with MEX is '4.7.x'. For a list of currently supported compilers see: 
 +http://www.mathworks.com/support/compilers/current_release.
 +</code>
 +However, the compilation was completed successfully.
 +</note>
 +===== Compiling your Matlab code =====
 +
 +There is an example MCR project in the ''/opt/shared/templates/'' directory for you to copy and try.  Copy on the head node and use ''salloc'' to compile with MATLAB on the ''standard'' partition.  Once your program is compiled, you can run it interactively or batch, without needing a MATLAB license.
 +
 +==== Copy dev-projects template ====
 +
 +On the head node, copy the example project into your current directory using the following commands
 +<code>
 +[traine@login01 ~]$ workgroup -g it_css
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ cp -r /opt/shared/templates/dev-projects/Projects/MCR .
 +[(it_css:traine)@login01 matlab_example]$ cd MCR
 +
 +</code>
 +
 +==== Compile with make ====
 +
 +Now compile on the compute node by using
 +
 +<code>
 +[(it_css:traine)@login01 MCR]$ salloc --partition=standard
 +salloc: Granted job allocation 7861739
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n00 are ready for job
 +[triane@r1n00 MCR]$
 +</code>
 +
 +<note>Remember you must be in a workgroup before using ''salloc''. The prompt (''[(it_css:traine)@login01 MCR]$'') displays the workgroup (e.g. ''it_css'') in this example. Also in this example specifying no other options, our job will be assigned to the ''standard'' partition for 30 minutes, 1 core and 1GB memory.</note>
 +
 +Check and edit the VALET command in the ''Makefile'' to load the appropriate version of the MATLAB Compile Runtime (''mcr'') package. In this example, we edited the ''Makefile'' to load ''mcr/r2019b:nojvm'', so the resulting output from the ''make'' command produces:
 +<code>
 +[traine@r1n00 MCR]$ make
 +Adding package `mcr/2019b` to your environment
 +make[1]: Entering directory `/home/2179/documents/matlab_example/MCR'
 +mcc -o maxEig -I ./common -R ""-nojvm,-nodesktop,-singleCompThread"" -v -m maxEig.m
 +Compiler version: 7.1 (R2019b)
 +Dependency analysis by REQUIREMENTS.
 +Parsing file "/home/1201/documents/matlab_example/MCR/maxEig.m"
 +        (referenced from command line).
 +Generating file "/home/1201/documents/matlab_example/MCR/readme.txt".
 +Generating file "run_maxEig.sh".
 +make[1]: Leaving directory `/home/1201/documents/matlab_example/MCR'
 +</code>
 +Take note of the package added, and the files that are generated.  You can remove these files, as they are not needed.
 +Remember the VALET command used to load the appropriate version of the ''mcr'' package for compiling will also need to be the same command (same version of ''mcr'') used to run your compiled code either interactively or batch.
 +==== Test interactively ====
 +
 +To test interactively on the same compute node.
 +<code>
 +[traine@r1n00 MCR]$ vpkg_require mcr/2019b:nojvm
 +Adding package `mcr/2019b:nojvm` to your environment
 +[traine@r1n00 MCR]$ time ./maxEig 20.8
 +
 +maxe =
 +
 +  510.8787
 +
 +
 +real    0m4.249s
 +user    0m3.235s
 +sys     0m0.387s
 +</code>
 +
 +=== back to the head node ===
 +Type ''exit'' to terminate the ''salloc'' interactive session and return to the login (head) node.
 +<code>
 +[traine@r1n00 MCR]$ exit
 +exit
 +salloc: Relinquishing job allocation 7861739
 +[(it_css:traine)@login01 MCR]$
 +</code>
 +==== Test batch ====
 +=== Copy array job example ===
 +On the head node, copy the MCR array example project and the ''matlab-mcr.qs'' template job script file into your current directory using the following commands 
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ cp -r /opt/shared/templates/dev-projects/Projects/MCR MCR_array
 +[(it_css:traine)@login01 ~]$ cd MCR_array
 +[(it_css:traine)@login01 MCR_array]$ cp /opt/shared/templates/slurm/applications/matlab-mcr.qs .
 +[(it_css:traine)@login01 MCR_array]$  make
 +Adding package `mcr/2019b` to your environment
 +make[1]: Entering directory `/home/2179/documents/matlab_example/MCR_array'
 +mcc -o maxEig -I ./common -R ""-nojvm,-nodesktop,-singleCompThread"" -v -m maxEig.m
 +Compiler version: 7.1 (R2019b)
 +Dependency analysis by REQUIREMENTS.
 +Parsing file "/home/2179/documents/matlab_example/MCR_array/maxEig.m"
 +        (referenced from command line).
 +Generating file "/home/2179/documents/matlab_example/MCR_array/readme.txt".
 +Generating file "run_maxEig.sh".
 +make[1]: Leaving directory `/home/2179/documents/matlab_example/MCR_array'
 +</code>
 +
 +The following lines will need to be changed or added to the ''matlab-mcr.qs'' file. Please read through all the comments, but we have provided the line number preceding the code where the alteration is needed for this example. Keep in mind for this example we are compiling with ''-single-comp-thread'' so we would not need to alternate to request additions cores (''%%--%%ntasks'').
 +
 +<code>
 +...
 +36 #SBATCH --mem=3G
 +...
 +54 #SBATCH --job-name=matlab_mcr_arrray
 +...
 +65 #SBATCH --partition=standard
 +...
 +92 #SBATCH --output=MCR_array-%A-%3a.out
 +...
 +117 # Setting the job array options
 +118 #SBATCH --array=1-100:1
 +...
 +157 # Load a specific Matlab MCR package into the runtime environment:
 +158 #
 +159 vpkg_require mcr/2019b:nojvm
 +160
 +161 #
 +162 # Do standard MCR environment setup:
 +163 #
 +164 . /opt/shared/slurm/templates/libexec/matlab-mcr.sh
 +165
 +166 #
 +167 # Execute your MCR program(s) here; prefix with UD_EXEC to
 +168 # ensure the job can/will respond to preemption/termination
 +169 # signals by calling your UD_JOB_EXIT_FN.
 +170 #
 +171 # Duplicate all three commands for each MCR program you run
 +172 # in sequence below.
 +173 #
 +174 #UD_EXEC my_mcr_program arg1 arg2
 +175 #mcr_rc=$?
 +176 #if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi
 +177
 +178 echo "Job Running on Host: $HOSTNAME"
 +179
 +180 start=$(date "+%s")
 +181 echo "Job Start: ${start}"
 +182
 +183 #Using the Slurm task ID as an argurment lambda to MaxEig
 +184 let lambda=$SLURM_ARRAY_TASK_ID
 +185
 +186 #Lines Added for MCR_array example
 +187 UD_EXEC ${HOME}/documents/matlab_example/MCR_array/maxEig $lambda
 +188 mcr_rc=$?
 +189 if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi
 +190
 +191 finish=$(date "+%s")
 +192 echo "Job Finish: ${finish}"
 +193
 +194 runtime=$(($finish-$start))
 +195
 +196 echo "Total Runtime: ${runtime}"
 +</code>
 +
 +Example ''sbatch'' submission
 +
 +<code>
 +[(it_css:traine)@login01 MCR]$  sbatch matlab-mcr.qs
 +Submitted batch job 4607879
 +[(it_css:traine)@login01 MCR]$ date
 +Wed Oct  4 10:51:41 EDT 2023
 +[(it_css:traine)@login01 MCR]$ date
 +Wed Oct  4 10:52:06 EDT 2023
 +[(it_css:traine)@traine MCR]$ ls -l MCR_array-4607879* | wc -l
 +100
 +</code>
 +
 +There are 100 output files with the names ''MCR_array-4607879-001.out'' to ''MCR_array-4607879-100.out''
 +For example, file 50 which is ''MCR_array-4607879-050.out'' looks like this:
 +
 +<code>
 +Adding package `mcr/2019b:nojvm` to your environment
 +-- Matlab MCR environment setup complete (on r1n02):
 +--  MCR_ROOT             = /opt/shared/matlab/2019b
 +--  MCR_CACHE_ROOT       = /tmp
 +
 +Job Running on Host: r1n02
 +Job Start: 1696431364
 +
 +maxe =
 +
 +  525.9320
 +
 +Job Finish: 1696431371
 +Total Runtime: 7
 +</code>
 +
 +
 +===== Compiling your code to use MATLAB engine ======
 +
 +Here is a simple example function called ''**fengdemo.F**'' coded in Fortran, which you can copy and use as a starting point.
 +
 +On the head node and in your workgroup shell:
 +
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ mkdir matlab_compile
 +[(it_css:traine)@login01 matlab_example]$ cd matlab_compile
 +[(it_css:traine)@login01 matlab_compile]$ vpkg_require matlab/2020b gcc/10.1
 +[(it_css:traine)@login01 matlab_compile]$ cp $MATLABROOT/extern/examples/eng_mat/fengdemo.F  .
 +[(it_css:traine)@login01 matlab_compile]$ export LD_LIBRARY_PATH=$MATLABROOT/bin/glnxa64:$MATLABROOT/sys/os/glnx64:$LD_LIBRARY_PATH
 +[(it_css:traine)@login01 matlab_compile]$ mex -client engine fengdemo.F
 +Building with 'gfortran'.
 +MEX completed successfully.
 +[(it_css:triane)@login01 matlab_compile]
 +</code>
 +
 +To run this program, it will require running an interactive session on a compute node with X11 forwarding enabled.  Here is an example for user ''traine'' in workgroup ''it_css'':
 +
 +<code>
 +[(it_css:traine)@login01 matlab_compile]$ salloc --x11 -N1 -n1 --partition=standard
 +salloc: Granted job allocation 4607984
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n00 are ready for job
 +[traine@r03g07 matlab_compile]$ vpkg_require matlab/2019b gcc/10.1
 +Adding package `matlab/2020b` to your environment
 +Adding package `gcc/10.1.0` to your environment
 +[traine@r03g07 matlab_compile]$ export LD_LIBRARY_PATH=$MATLABROOT/bin/glnxa64:$MATLABROOT/sys/os/glnx64:$LD_LIBRARY_PATH
 +[traine@r03g07 matlab_compile]$ ./fengdemo
 +</code>
 +
 +Shortly after starting to run the program, ''./fengdemo'', a Matlab window will open and display a chart below
 +
 +{{ :software:matlab:matlab_engine_compile.jpg?600 |}}
 +
 +After the Matlab window is opened, you will see a prompt in the terminal to "Exit" or "Continue". Typing ''1''and pressing the ''Enter'' key will return a table which is shown below.  
 +
 +After the table is returned, close the MATLAB window with the Chart. Then use the ''exit'' command to release the computer node. 
 +
 +<code>
 + Type 0 <return> to Exit
 + Type 1 <return> to continue
 +1
 + MATLAB computed the following distances:
 +   time(s)  distance(m)
 +   1.00     -4.90
 +   2.00     -19.6
 +   3.00     -44.1
 +   4.00     -78.4
 +   5.00     -123.
 +   6.00     -176.
 +   7.00     -240.
 +   8.00     -314.
 +   9.00     -397.
 +   10.0     -490.
 +[traine@r1n00 matlab_compile]$ exit
 +salloc: Relinquishing job allocation 4607984
 +[(it_css:traine)@login01 matlab_compile]$
 +</code>
 +
 +===== Compiling your own MATLAB function ======
 +
 +There is a simple example function ''**timestwo.c**'', coded in c, which you can copy and use as a starting point.
 +
 +On the head node and in a workgroup shell:
 +
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ mkdir matlab_function
 +[(it_css:traine)@login01 matlab_example]$ cd matlab_function
 +[(it_css:traine)@login01 matlab_function]$ vpkg_require matlab/2020b gcc/10.1
 +Adding package `matlab/r2020b` to your environment
 +Adding package `gcc/10.1.0` to your environment
 +[(it_css:traine)@login01 matlab_function]$ cp $MATLABROOT/extern/examples/refbook/timestwo.c .
 +[(it_css:traine)@login01 matlab_function]$ mex timestwo.c
 +Building with 'gcc'.
 +Warning: gcc version '10.1.0' not supported. See https://www.mathworks.com/support/compilers/current_release.
 +MEX completed successfully.
 +[(it_css:traine)@login00 matlab_function]$
 +
 +</code>
 +
 +To start MATLAB on a compute node to test this new function:
 +
 +<code>
 +[(it_css:traine)@login01 matlab_function]$ salloc --partition=standard
 +salloc: Pending job allocation 4607991
 +salloc: job 4607991 queued and waiting for resources
 +salloc: job 4607991 has been allocated resources
 +salloc: Granted job allocation 4607991
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n00 are ready for job
 +[traine@r00n56 matlab_function]$ vpkg_require matlab/2020b gcc/10.1
 +[traine@r00n56 matlab_function]$ matlab -nodesktop
 +MATLAB is selecting SOFTWARE OPENGL rendering.
 +
 +                                             < M A T L A B (R) >
 +                                    Copyright 1984-2019 The MathWorks, Inc.
 +                                    R2020b (9.9.0.1467703) 64-bit (glnxa64)
 +                                              August 26, 2020
 +
 +
 +To get started, type doc.
 +For product information, visit www.mathworks.com.
 +
 +>>
 +</code>
 +
 +Now test the function by typing ''timestwo(4)''. The results are shown below. Afterwards type ''quit'' to exit Matlab and then type ''exit'' to release the compute node.
 +
 +<code>
 +>> timestwo(4)
 +
 +ans =
 +
 +     8
 +
 +>> quit
 +[traine@r1n00 matlab_function]$ exit
 +exit
 +salloc: Relinquishing job allocation 4607991
 +[(it_css:traine)@login01 matlab_function]$
 +</code>
 +====== Batch job serial example ======
 +
 +Second, write a shell script file to set the MATLAB environment and start MATLAB running your script file. The following script file will set the MATLAB environment and run the command in the [[#matlab-script|script.m]] file:
 +
 +<note>
 +''script.m'' calls the ''maxEig'' function in the [[#matlab-function|maxEig.m]] file. Make sure this is also in the directory.
 +</note>
 +
 +
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ mkdir matlab_slurm
 +[(it_css:traine)@login01 matlab_example]$ cd matlab_slurm
 +[(it_css:traine)@login01 matlab_slurm]$ cp /opt/shared/templates/slurm/generic/serial.qs batch.qs
 +[(it_css:traine)@login01 matlab_slurm]$ vim batch.qs
 +</code>
 +<file bash batch.qs>
 +...
 +40 #SBATCH --job-name=script.m
 +...
 +50 #SBATCH --partition=standard
 +...
 +67 #SBATCH --time=0-03:00:00
 +...
 +76 #SBATCH --output=%x-%j.out
 +77 #SBATCH --error=%x-%j.out
 +...
 +86 #SBATCH --mail-user='traine@udel.edu'
 +87 #SBATCH --mail-type=END,FAIL,TIME_LIMIT_90
 +...
 +137 #
 +138 # [EDIT] Add your script statements hereafter, or execute a script or program
 +139 #        using the srun command.
 +140 #
 +141 #srun date
 +142 #Loading MATLAB
 +143 vpkg_require matlab/2020b
 +144 #Running the matlab script
 +145 matlab -nodisplay -nojvm -batch script
 +
 +</file>
 +
 +
 +Make sure you change the ''%%--%%mail-user'' from ''traine@udel.edu'' to your preferred email address. The ''-nodisplay'' indicates no X11 graphics, which implies ''-nosplash -nodesktop''. The ''-nojvm'' indicates no Java. (Java is needed for some functions, e.g., print graphics, but should be excluded for most computational jobs.)
 +The ''-batch'' is followed by a Matlab command, enclosed in quotes when there is are spaces in the command. 
 +
 +<note tip>**Errors in the Matlab script**:
 +The command ''script'' will execute the lines in the ''script.m'' file. For some errors Matlab will display the error message and wait for a response -- clearly not appropriate for a batch job.  Consider replacing ''script'' with
 +the compound command 
 +<code>
 +"try; script; catch ERR; disp(getReport(ERR,'extended')); quit; end"
 +</code>
 +The purpose of the **''try/catch''** block is to catch the first error in the script, and display a report.  With the **''extended''** option the report will include a stack trace at the point of the error.
 +</note>
 +
 +<note tip>**Graphics in the Matlab script**
 +
 +  * Do not include the ''-nojvm'' on the **matlab** command.
 +  * Do set paper dimensions and print each figure to a file.
 +
 +The text output will be included in the standard Slurm output file, but not any graphics.  All figures must be exported using the **print** command. Normally the **print** command will print on an 8 1/2 by 11 inch page with margins that are for a printed page of paper.  The size and margins will not work if you plan to include the figure in a paper or a web page.
 +
 +We suggest setting the current figure's ''PaperUnits'', ''PaperSize'' and ''PaperPosition'' Matlab provides a handle to the current figure (**gcf**).  For example, the commands
 +
 +<code matlab>
 +  set(gcf,'PaperUnits','inches','PaperSize',[4,3],'PaperPosition',[0 0 4 3]);
 +  print('-dpng','-r100','maxe.png');
 +</code>
 +
 +will set the current figure to be 4 x 3 inches with no margins, and then print the figure as a 400x300 resolution ''png'' file.
 +</note>
 +
 +==== Submit job ====
 +Third, from the directory with ''script.m'', ''maxEig.m'' and ''batch.qs'', submit the batch job with the command:
 +
 +<code>
 +sbatch batch.qs
 +</code>
 +==== Wait for completion ====
 +Finally, wait for the mail notification, which will be sent to ''traine@udel.edu'' unless you changed it to your preferred email address.  When the job is done, the output from the MATLAB command will be in a file with the pattern ''script.m-<<JOBID>>.out'', where ''JOBID'' is the number assigned to your job.
 +
 +After waiting for about 2 or 3 hours, a message was received from SLURM Administrator. The email will have a title like the one shown below and there will be no content in the body.
 +<code>
 +SLURM Job_id=4608094, Name=script.m Ended, Run time 02:07:16, COMPLETED, ExitCode 0
 +</code>
 +
 +==== Gather results ====
 +The results for Job 4608094 are in the file
 +<file text script.m-4608094.out >
 +Fri Apr 10 16:36:38 EDT 2020
 +Adding package `matlab/2020b` to your environment
 +maxe =
 +
 +   70.0220
 +
 +
 +maxe =
 +
 +   71.7546
 +
 +
 +maxe =
 +
 +   70.8331
 +
 +
 +maxe =
 +
 +   70.5714
 +
 +
 +maxe =
 +
 +   69.4923
 +
 +
 +maxe =
 +
 +   67.7814
 +
 +
 +maxe =
 +
 +   70.5037
 +
 +
 +maxe =
 +
 +   68.3293
 +
 +
 +maxe =
 +
 +   69.5694
 +
 +       ...  //Skipping 953 similar displays of variable maxe//
 +
 +maxe =
 +
 +    67.4221
 +Elapsed time is 7628.882218 seconds.
 +
 +avgMaxEig =
 +
 +   69.5131
 +
 +</file>
 +==== Timings and core count ====
 +
 +Consider a batch job run with these Slurm options: 
 +<code>
 +
 +     #SBATCH --ntasks=5
 +     #SBATCH --mem=1G
 +     #SBATCH --job-name=script_opt.m
 +</code>
 +
 +The ''sbatch'' command will assign a ''JOBID'', and once it starts running, the ''squeue'' command will show the node you are running on and we can use it to set ''n=r01n17'' and refer to it as ''$n'' for our series of next commands.  After about 10 minutes of running:
 +<code>
 +[(it_css:traine)@login01 matlab_slurm]$ n=r1n00
 +[(it_css:traine)@login01 matlab_slurm]$ echo $n
 +r1n00
 +[(it_css:traine)@login01 matlab_slurm]$ ssh $n ps -eo pid,ruser,pcpu,pmem,thcount,stime,time,command | egrep '(COMMAND|matlab)'
 +  PID RUSER    %CPU %MEM THCNT STIME     TIME COMMAND
 +2277 traine    98.4  0.1    10 13:56 00:19:53 /opt/shared/matlab/2020b/bin/glnxa64/MATLAB -nodisplay -batch script -nojvm
 +</code>
 +This ''ps'' command will give the percent CPU, which is ''= >100%'' for multi-core jobs, the percent memory, the thread count, which is ''> 5'', the start time, the time of executions, and finally the full command used to the start the job.
 +
 +Given the reported PID ''10853'', you can drill down and see which of the 10 threads are consuming CPU time:
 +<code>
 +[(it_css:traine)@login01 matlab_slurm]$ ssh $n ps -eLf | egrep '(PID|10853)' | grep -v ' 0  '
 +UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
 +traine   2277   2223   2430 98 10 13:56  ?        00:27:05 /opt/shared/matlab/2020b/bin/glnxa64/MATLAB -nodisplay -batch script -nojvm
 +</code>
 +
 +While the batch job was running on node ''r1n00'', the ''top'' command was run to sample the resources being used by MATLAB
 +every second ''-b -n 1'' and can only be used on computing nodes you have jobs running.
 +
 +<code>
 +[(it_css:traine)@login01 matlab_slurm]$ ssh $n top -b -n 1 | egrep '(COMMAND|MATLAB)' | grep -v ' 0'
 +  PID USER      PR  NI    VIRT    RES    SHR S %CPU %MEM     TIME+ COMMAND
 +2277 traine    20   0 1888856 917824 126216  S 88.9  0.2  28:47.25 MATLAB     
 +</code>
 +
 +
 +<code>
 +[(it_css:traine)@login01 matlab_slurm]$ qhost -h $n
 +HOSTNAME                ARCH         NCPU NSOC NCOR NTHR NLOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
 +----------------------------------------------------------------------------------------------
 +r1n00                  standard      64    2   64   64  0.93  488.0G    70.0G     0.0     0.0  
 +</code>
 +
 +After the job is done you can use ''sacct'' to get a recap of resources used:
 +<code>
 +[(it_css:traine)@login00 matlab_slurm]$ sacct -N r1n00 -j 4608094 -o jobName,jobID,Nodelist,maxVMSize,MaxRSS,CPUTime,Start,End,Elapsed,State
 +   JobName        JobID        NodeList  MaxVMSize     MaxRSS    CPUTime               Start                 End    Elapsed      State
 +---------- ------------ --------------- ---------- ---------- ---------- ------------------- ------------------- ---------- ----------
 +script_op+ 4608094               r1n00                         02:07:16 2023-10-04T14:39:57 2023-10-04T16:47:13   02:07:16  COMPLETED
 +     batch 4608094.bat+          r1n00   169808K    812344K    02:07:16 2023-10-04T14:39:57 2023-10-04T16:47:13   02:07:16  COMPLETED
 +    extern 4608094.ext+          r1n00   183024K             02:07:16 2023-10-04T14:39:57 2023-10-04T16:47:13   02:07:16  COMPLETED
 +
 +</code>
 +
 +
 +
 +====== Batch job parallel example ======
 +
 +The MATLAB Parallel Computing toolbox uses JVM to manage the workers and communicate while you are running.  You
 +need to setup the MATLAB pools in your ''script''.
 +
 +==== Matlab parallel script ====
 +Here is the slightly modified MATLAB script. 
 +
 +Add the necessary commands to configure your ''parcluster'' and ''parpool'', and change ''for'' ⇒ ''parfor''.
 +<file text pscript.m>
 +% script to run maxEig function 200 times
 +%% Configure parpool
 +myCluster = parcluster('local');
 +myCluster.NumWorkers = str2double(getenv('SLURM_NTASKS'));
 +myCluster.JobStorageLocation = getenv('TMPDIR');
 +myPool = parpool(myCluster, myCluster.NumWorkers);
 +
 +count = 200;
 +dim = 5001;
 +sumMaxe = 0;
 +
 +tic
 +parfor i=1:count;
 +  sumMaxe = sumMaxe + maxEig(i,dim);
 +end
 +toc
 +avgMaxEig = sumMaxe/count
 +
 +delete(myPool);
 +exit
 +
 +</file>
 +
 +==== Slurm parallel script ====
 +Remove the option ''-nojvm'', because JVM is needed for the Parallel Computing toolbox commands.
 +Copy the template ''matlab.qs'' script and name it ''pbatch.qs'' by typing
 +<code>
 +cp /opt/shared/templates/slurm/applications/matlab.qs ./pbatch.qs
 +</code>
 +Make the following changes to the code
 +<file bash pbatch.qs>
 +...
 +19 #SBATCH --ntasks=20
 +...
 +37 #SBATCH --mem=60G
 +...
 +54 #SBATCH --job-name=matlab-pscript
 +...
 +65 #SBATCH --partition=standard
 +...
 +75 #SBATCH --time=0-01:00:00
 +...
 +82 #SBATCH --time-min=0-00:30:00
 +...
 +90 #SBATCH --output=%x-%j.out
 +91 #SBATCH --error=%x-%j.out
 +...
 +155 vpkg_require matlab/2020b
 +...
 +170 UD_EXEC matlab -nodisplay -batch pscript
 +
 +</file>
 +
 +==== Timing results ====
 +Reported usage for same job run using the parallel toolbox.
 +<code>
 +[(it_css:traine)@login00 matlab_slurm]$ sacct  -j 4610111,4610362 -o jobName,jobID,Nodelist,maxVMSize,MaxRSS,CPUTime,Start,End,Elapsed,State
 +   JobName        JobID        NodeList  MaxVMSize     MaxRSS    CPUTime               Start                 End    Elapsed      State
 +---------- ------------ --------------- ---------- ---------- ---------- ------------------- ------------------- ---------- ----------
 +   pscript 4610111                r1n44                         05:54:20 2023-10-06T11:02:36 2023-10-06T11:20:19   00:17:43  COMPLETED
 +     batch 4610111.bat+           r1n44    169940K  22162024K   05:54:20 2023-10-06T11:02:36 2023-10-06T11:20:19   00:17:43  COMPLETED
 +    extern 4610111.ext+           r1n44      4364K          0   05:54:20 2023-10-06T11:02:36 2023-10-06T11:20:19   00:17:43  COMPLETED
 +  script.m 4610362                r1n37                       1-14:42:20 2023-10-06T14:45:41 2023-10-06T16:41:48   01:56:07  COMPLETED
 +     batch 4610362.bat+           r1n37    169940K   1092448K 1-14:42:20 2023-10-06T14:45:41 2023-10-06T16:41:48   01:56:07  COMPLETED
 +    extern 4610362.ext+           r1n37      4364K          0 1-14:42:40 2023-10-06T14:45:41 2023-10-06T16:41:49   01:56:08  COMPLETED
 +
 +</code>
 +
 +Compare script vs pscript
 +
 +^ Job ^ Elapsed Time ^ CPUTime ^ Max RSS^
 +| script.m | 01:56:07|  1-14:42:20| 1092448K |
 +| pscript | 00:17:43 | 05:54:20 | 22162024K |
 +
 +The job **script** used more CPU resources with the multiple computational threads, while **pscript** user more memory resources with 20 single-threaded worker.
 +
 +
 +
 +====== Interactive job example ======
 +
 +The basic steps to running a [[:software:matlab:darwin#interactive-job|MATLAB]] job interactively on a compute node that will dedicate specific resources to your job. 
 +
 +
 +
 +
 +
 +
 +==== Scheduling interactive job ====
 +Create a directory and add [[#matlab-function|maxEig.m]] and [[#slurm-script|script.m]] to it.
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ mkdir matlab_interact
 +
 +[(it_css:traine)@login01 matlab_example]$ cp maxEig.m script.m matlab_interact/
 +[(it_css:traine)@login01 matlab_example]$ cd matlab_interact
 +[(it_css:traine)@login01 matlab_interact]$ ls
 +maxEig.m  script.m
 +</code>
 +Start an interactive session on a compute node with the ''salloc'' command. You will also want to include the options for the number of cores ''ntasks'' and ''--partition=standard''.
 +<code>
 +[(it_css:traine)@login00 matlab_interact]$ salloc --partition=standard --ntasks=20
 +salloc: Pending job allocation 4608120
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n[00,02] are ready for job
 +[traine@r01n10 matlab_interact]$
 +</code>
 +
 +==== Starting a command mode matlab session ====
 +<code>
 +[traine@r01n10 matlab_interact]$ vpkg_require matlab/2020b
 +Adding package `matlab/2020b` to your environment
 +[traine@r01n10 matlab_interact]$
 +</code>
 +<code>
 +[traine@r01n10 matlab_interact]$ matlab -nodesktop -nosplash
 +MATLAB is selecting SOFTWARE OPENGL rendering.
 +
 +                                       < M A T L A B (R) >
 +                             Copyright 1984-2019 The MathWorks, Inc.
 +                             R2020b (9.9.0.1467703) 64-bit (glnxa64)
 +                                       August 26, 2020
 +
 +
 +To get started, type doc.
 +For product information, visit www.mathworks.com.
 +
 +>>
 +</code>
 +
 +
 +
 +
 +
 +==== Using help as the first command ====
 +
 +<code>
 +>> help maxEig
 +   maxEig   Maximum Eigenvalue of a random matrix
 +      Input parameters
 +        sd - seed for uniform random generator
 +        dim - size of the square matrix (should be odd)
 +      Output value
 +        maxe - maximum real eigvalue
 +        
 +</code>
 +
 +==== Calling function once ====
 +
 +Use the tic and toc commands to report the elapsed time to generate the random matrix, find all eigenvalues and report the maximum real eigenvalue.
 +
 +<code>
 +>> tic; maxEig(1,5001); toc
 +
 +maxe =
 +
 +   70.0220
 +
 +Elapsed time is 24.476652 seconds.
 +</code>
 +
 +==== Finishing up ====
 +
 +<code>
 +>> exit
 +[traine@r1n00 matlab_interact]$ exit
 +exit
 +salloc: Relinquishing job allocation 4608120
 +[(it_css:traine)@login01 matlab_interact]$
 +
 +</code>
 +===== Interactive parallel toolbox example =====
 +This example is based on the ''matlab_interact'' directory that was created in the [[software:matlab:darwin#interactive-job|Interactive job example]] demo shown above.
 +
 +When you are using the parallel toolbox, you should login to a compute node using a workgroup partition and the number of tasks and memory required:
 +
 +<code>
 +[(it_css:traine)@login01 matlab_interact]$ salloc --partition=standard --ntasks=20 --mem=40G
 +salloc: Granted job allocation 4608122
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n[00,02] are ready for job
 +[traine@r1n00 matlab_interact]$ vpkg_require matlab/2020b
 +[traine@r1n00 matlab_interact]$ matlab -nodesktop -nosplash
 +</code>
 +  
 +This will effectively reserve 20 cpus and 40G of memory for your interactive job.  The default number of parallel workers when using the parallel toolbox is 12 but you can define the number workers based on the number of tasks requested.
 +
 +Here we request 20 workers with the ''parpool'' function, and then use ''parfor'' to send a different seed to each worker.  The output is from the workers, as they complete, but the order is not deterministic.
 +
 +<note important>**Make sure the workers are not doing exactly the same computations**  In this example, the different seed, passed to the function, causes all the random values to be different on each worker.</note>
 +
 +It took about 100 seconds for all 20 workers to produce a result, however since there are 20 workers working in parallel the elapsed time to complete 200 results is about 918 seconds. 
 +
 +<code>
 +MATLAB is selecting SOFTWARE OPENGL rendering.
 +
 +                                       < M A T L A B (R) >
 +                             Copyright 1984-2019 The MathWorks, Inc.
 +                             R2020b (9.9.0.1467703) 64-bit (glnxa64)
 +                                       August 26, 2020
 +
 +
 +To get started, type doc.
 +For product information, visit www.mathworks.com.
 +
 +>> myCluster = parcluster('local');
 +>> myCluster.NumWorkers = str2double(getenv('SLURM_NTASKS'));
 +>> myCluster.JobStorageLocation = getenv('TMPDIR');
 +>> myPool = parpool(myCluster, myCluster.NumWorkers);
 +Starting parallel pool (parpool) using the 'local' profile ...
 +Connected to the parallel pool (number of workers: 20).
 +>>  tic; parfor sd = 1:200; maxEig(sd,5001); end; toc
 +
 +maxe =
 +
 +   67.1320
 +
 +
 +maxe =
 +
 +   70.8721
 +
 +
 +maxe =
 +
 +   71.3507
 +
 +... skipped lines ...
 +
 +maxe =
 +
 +   70.2656
 +
 +
 +maxe =
 +
 +   69.8759
 +
 +
 +maxe =
 +
 +   70.9002
 +
 +Elapsed time is 1456.599991 seconds.
 +</code>
 +
 +Once the job is completed, delete your pool and exit MATLAB, and release the interactive compute node by typing ''exit''
 +
 +<code>
 +>> delete(myPool);
 +Parallel pool using the 'local' profile is shutting down.
 +>> exit
 +[traine@r1n00 matlab_interact]$ exit
 +exit
 +salloc: Relinquishing job allocation 4608122
 +[(it_css:traine)@login01 matlab_interact]$
 +</code>
 +====== MCR array job example ======
 +
 +Most Matlab functions can be compiled using the Matlab Compiler (''mcc'') and then deployed to run on the compute nodes in the MATLAB Compiler Runtime (MCR).  The MCR is a prerequisite for deployment, and is installed on all the compute nodes. You must use VALET to set up the libraries you will need to run your function from the command line.  You should **NOT** use the shell (''.sh'' file) that the Matlab compiler creates.
 +
 +There are two ways to run compiled MATLAB jobs in a shared environment, such as DARWIN.
 +  - Compile to produce an executable that uses a single computational thread specifying the MATLAB option ''-singleCompThread''
 +  - Submit the job to use the nodes exclusively specifying the Slurm option ''--exclusive''
 +
 +You can run more jobs on each node when they are compiled using just one core (Single Computational Thread).  This will give you higher throughput for an array job, but not higher performance.  
 +
 +
 +==== Example compiler commands ====
 +
 +Make a new directory ''MCR_array_II'' directory and then copy [[#matlab-function|maxEig function]] from the matlab_example directory to the new ''MCR_array_II'' directory.
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ mkdir MCR_array_II
 +[(it_css:traine)@login01 matlab_example]$ cp maxEig.m MCR_array_II/
 +[(it_css:traine)@login01 matlab_example]$ cd MCR_array_II
 +</code>
 +The [[#matlab-function|maxEig function]] has a conditional statement to make it work when deployed.
 +<code>
 +  if (isdeployed)
 +    sd = str2num(sd)
 +    dim = str2num(dim)
 +  end
 +</code>
 +All arguments of the function are taken as tokens on the shell command used to execute the script, 
 +and they are all strings.  You must convert numbers from strings to numbers.  You can use the same variable names so 
 +that the rest of the script will behave the same when deployed or executed directly in Matlab.
 +
 +You can convert this function into a single computational executable by using the Matlab compiler ''mcc'' To do this, create a file ''compile.sh'' and add the below line to the file.
 +<code>
 +prog=maxEig
 +opt='-nojvm,-nodisplay,-singleCompThread'
 +version='2019b'
 +
 +vpkg_require matlab/$version
 +mcc -R "$opt" -mv $prog.m
 +
 +[ -d ${WORKDIR}/${USER}/sw/bin ] && mv $prog ${WORKDIR}/${USER}/sw/bin
 +</code>
 +
 +<note tip>**Keep these commands in a file**: Even though this is just two commands, we recommend you keep these commands, including the shell assignment statements, as a record of the MATLAB version and options you used to create the executable ''maxEig'' You will need to know these if you want to use the executable in a shell script.  You can source this file when you want to rebuild ''maxEig''</note>
 +
 +<note tip>**You can get mcc usage instructions with ''mcc -help''**: The string following the ''-R'' flag are the Matlab
 +options you want to use at run time.  The ''-m'' option tell mcc to build a standalone application to be deployed
 +using MCR.  The ''-v'' option is for verbose mode. </note>
 +
 +<note warning>You cannot execute a file from a directory on the ''lustre'' file system.  That is why the executable **''$prog''** is moved to the special directory, which is added to your path when a new workgroup shell is started or when a queue script is submitted.
 +<code>
 +[ -d $WORKDIR/sw/bin ] && mv $prog $WORKDIR/sw/bin
 +</code>
 +</note>
 +
 +==== Compiling commands ====
 +
 +Make the directory where the MaxEig function will be placed when the function is compiled.
 +
 +<code>
 +[(it_css:traine)@login01 MCR_array_II]$ mkdir -p ${WORKDIR}/${USER}/sw/bin
 +</code>
 +<note important>
 +If you have a permission error, check to make sure that you are in your workgroup.
 +</note>
 +Now request a interactive compute node and run the ''compile.sh'' script.
 +
 +<code>
 +[(it_css:traine)@login01 MCR_array_II]$ salloc --partition=standard
 +salloc: Granted job allocation 4608255
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n00 are ready for job
 +[traine@r1n00 MCR_array_II]$ ls
 +compile.sh  maxEig.m
 +[traine@r1n00 MCR_array_II]$ . compile.sh
 +Adding package `matlab/2019b` to your environment
 +Compiler version: 7.1 (R2019b)
 +Dependency analysis by REQUIREMENTS.
 +Parsing file "/home/1201/matlab_example/MCR_array_II/maxEig.m"
 +        (referenced from command line).
 +Generating file "/home/1201/matlab_example/MCR_array_II/readme.txt".
 +Generating file "run_maxEig.sh".
 +[traine@r00n56 MCR_array_II]$ ls
 +compile.sh  maxEig.m  mccExcludedFiles.log  readme.txt  requiredMCRProducts.txt  run_maxEig.sh
 +[traine@r00n56 MCR_array_II]$ exit
 +exit
 +salloc: Relinquishing job allocation 4608255
 +[(it_css:traine)@login01 MCR_array_II]$
 +
 +</code>
 +
 +
 +
 +==== Example queue script file ====
 +
 +The ''mcc'' command will generate a ''.sh'' file that should **not** be used.  This run script does not use VALET and does not have the appropriate Slurm commands.  Instead, you should copy the Slurm template in the file 
 +''/opt/shared/templates/slurm/applications/matlab-mcr.qs'' by using the following command
 +<code>
 +[(it_css:traine)@login00 MCR_array_II]$ cp /opt/shared/templates/slurm/applications/matlab-mcr.qs .
 +
 +</code>
 +and make the appropriate changes below changes.
 +<code>
 +...
 +20 #SBATCH --ntasks=2
 +...
 +29 #SBATCH --mem=3G
 +...
 +47 #SBATCH --job-name=maxEig
 +...
 +58 #SBATCH --partition=standard
 +...
 +85 #SBATCH --output=%x-%A-%3a.out
 +...
 +102 # Setting the job array options
 +103 #SBATCH --array=1-20:1
 +...
 +148 # Load a specific Matlab MCR package into the runtime environment:
 +149 #
 +150 vpkg_require mcr/2019b:nojvm
 +151 export MCR_CACHE_ROOT="$TMPDIR"
 +152
 +153 #
 +154 # Do standard MCR environment setup:
 +155 #
 +156 . /opt/shared/slurm/templates/libexec/matlab-mcr.sh
 +157
 +158 #
 +159 date "+Start %s"
 +160 echo "Host ${HOSTNAME}"
 +161
 +162
 +163 #Getting the ask ID that will be passed as a argument
 +164 let seed=$SLURM_ARRAY_TASK_ID
 +165 let dim=5001
 +166
 +167 # Execute your MCR program(s) here; prefix with UD_EXEC to
 +168 # ensure the job can/will respond to preemption/termination
 +169 # signals by calling your UD_JOB_EXIT_FN.
 +170 #
 +171 # Duplicate all three commands for each MCR program you run
 +172 # in sequence below.
 +173 #
 +174 #UD_EXEC my_mcr_program arg1 arg2
 +175 #mcr_rc=$?
 +176 #if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi
 +177 UD_EXEC ${WORKDIR}/${USER}/sw/bin/maxEig $seed $dim
 +178 mcr_rc=$?
 +179 if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi
 +180
 +181 date "+Finish %s"
 +182
 +
 +
 +
 +</code>
 +
 +The two ''date'' commands record the start and finish time in seconds for each task.  These are then used to compute the total runtime. The echoed hostname can be used to calculate the overlapping use of the computer nodes.  Since ''maxEig'' was compiled as a single-threaded job, the elapsed time will be very close to the wall clock time and CPU time.  We do not send email notifications since it would generate 20 email messages, one for each task.
 +
 +==== Running Compiled Matlab Example In Workgroup And Analyzing Output Results ====
 +
 +To test the example compiled Matlab job on the ''it_css'' owner queues, we first compiled the code with mcc and
 +then submitted it with sbatch.  
 +<code>
 +[(it_css:traine)@login00 MCR_array_II]$ sbatch matlab-mcr.qs
 +Submitted batch job 4610424
 +</code>
 +
 +The assigned job number ID assigned is 4608461.  After a few minutes, 20 files were created in the current directory.
 +    maxEig-4609401-001.out  ...   maxEig-4609401-020.out
 +They each had the output of one task.  For example, for taskid 12:
 +<code>
 +Adding package `mcr/2019b:nojvm` to your environment
 +-- Matlab MCR environment setup complete (on r1n10):
 +--  MCR_ROOT             = /opt/shared/matlab/2019b
 +--  MCR_CACHE_ROOT       = /tmp
 +
 +Start 1696620250
 +Host r1n10
 +
 +sd =
 +
 +    12
 +
 +
 +dim =
 +
 +        5001
 +
 +
 +maxe =
 +
 +   67.8668
 +
 +Finish 1696620347
 +</code>
 +
 +Now we will use [[#perl-script-for-compiled-matlab|wikigather.pl]] to gather all the information from these files and return the avgMaxEig value.
 +Use the link to copy the perl code, and then create a new file in your current directory with that same name ''wikigather.pl'' and add the copied code into that file.
 +
 +<note important>
 +After copying the code, you will need to make sure that you change the job id value in the pattern variable to match **your job id** 
 +</note>
 +
 +<code>
 +[(it_css:traine)@login00 MCR_array_II]$ perl wikigather.pl
 +avgMaxEig = 69.020935
 +</code>
 +The script will all create three new .data files and one new .txt file. We are really on interested in the result4610424.data and the wikimaxEig.txt files. Examples of them are shown below.
 +<code result4610424.data>
 +sd dim maxe
 +1 5001 70.0220
 +2 5001 71.7546
 +3 5001 70.8331
 +4 5001 70.5714
 +5 5001 69.4923
 +6 5001 67.7814
 +7 5001 70.5037
 +8 5001 68.3293
 +9 5001 69.5694
 +10 5001 62.0057
 +11 5001 69.3870
 +12 5001 67.8668
 +13 5001 66.8721
 +14 5001 69.2772
 +15 5001 66.8973
 +16 5001 70.6231
 +17 5001 69.5337
 +18 5001 68.5267
 +19 5001 70.4993
 +20 5001 70.0726
 +</code>
 +
 +We can gather the start/finish times in seconds and the host name.
 +
 +=== wiki4610424.txt Output: ===
 +SGE array job started Fri 06 Oct 2023 03:24:10 PM EDT
 +
 +Used a total of 1850 CPU seconds over 100 seconds of elapsed time on 3 nodes
 +^              Node  ^^               Real Clock Time  ^^^       Ratio  ^
 +^    Name  ^   Count  ^      Min  ^      Max  ^  Average  ^   User/Real  ^
 +|   r1n02|       2|     86.00|    86.00|    86.00 |     1.00000|
 +|   r1n04|       8|     79.00|    99.00|    92.62 |     1.00000|
 +|   r1n10|      10|     87.00|    97.00|    93.70 |     1.00000|
 +
 +
 +Using gnuplot we get a time chart of usage on the 3 nodes and total CPU usage.
 +Create a file and add the following code to a file named ''plot<<JOB ID>>.gnuplot''.
 +<code plot4610424.gnuplot>
 + set terminal png  size 640,640
 + set output "wiki4610424.png"
 +  set multiplot layout 2,1
 +  set xrange [0:120]
 +  set yrange [0:30]
 +  set key on
 +  set title "Tasks on 3 nodes by time (seconds)"
 +  set key on
 +  plot "count.data" u 1:4 t "r1n10.localdomain.hpc.udel.edu" w filledcurves,"count.data" u 1:3 t "r1n04.localdomain.hpc.udel.edu" w filledcurves, "count.data" u 1:2 t "r1n02.localdomain.hpc.udel.edu" w filledcurves
 +  set title "User time usage rate on all nodes"
 +  plot "usage.data" u 1:2 w steps t "CPU"
 +
 +</code>
 +<note important>You will need to update the "plot" line with the correct amount nodes and their respected names for the nodes used on the job you ran.</note>
 +To create the plot we will need to request an interactive compute node on the devel partition. Once the request has been filled we will need to use VALET to load the gnuplot application and run the ''plot<<JOB ID>>.gnuplot'' script that we just created. After the script is ran we will release the node and then view the ''.png'' that was created by the script. 
 +<code>
 +[(it_css:traine)@login00 MCR_array_II]$ salloc --partition=standard
 +salloc: Granted job allocation 4610467
 +salloc: Waiting for resource configuration
 +salloc: Nodes r1n02 are ready for job
 +[traine@r00n56 MCR_array_II]$ vpkg_require gnuplot
 +Adding package `gnuplot/5.4.5` to your environment
 +[traine@r00n56 MCR_array_II]$ gnuplot plot4610424.gnuplot
 +[traine@r00n56 MCR_array_II]$exit
 +[(it_css:traine)@login00 MCR_array_II]$ display wiki4610424.png
 +</code>
 +<note important>Make sure that you have X11 forwarding setup on your SSH connection to be able to view the image.</note>
 +An example of the ''.png'' file that is created by the [[#compiled-matlab-in-owner-queues|plot4610424.gnuplot]] script.
 +{{ :software:matlab:wiki4610424.png?600 |}}
 +
 +
 +
 +==== Perl Script For Compiled Matlab  ====
 +<note important>
 +**DO NOT COPY AND PASTE THIS CODE IT MOSTLY LIKELY NOT FORMAT CORRECTLY AND BREAK THE CODE. INSTEAD DOWNLOAD THE FILE WITH ''WGET'' DIRECTLY TO THE LOCATION YOU WOULD LIKE IT. YOU COULD ALSO DOWNLOAD IT LOCALLY AND WITH SCP TRANSFER TO BACK TO CAVINESS.**
 +</note>
 +
 +<file perl wikigather.pl>
 +$pattern = '\-(9805558)\-(\d+)\.(out)'; # Make sure to change 9805558 to make your job id.
 +$countFile = 'count.data'; # task count on nodes by seconds
 +$usageFile = 'usage.data'; # accumulate user time on all nodes by seconds
 +$nodeUsageFile = "nodeusage.data"; #detail of node usage by seconds
 +$nodeUsageFiles = "%s_usage.data"; # %s -> host
 +@varNames = qw/sd dim maxe/; # used for columns in resultfile
 +$resultFile = "result%s.data"; # %s -> project id
 +&scandir(".");
 +
 +@node = sort keys %hostCount;
 +
 +foreach $jobid (keys %startTime) {
 +  my $file = sprintf "wiki%s.txt", $jobid;
 +  open(WIKI, ">$file");
 +  print WIKI `date -d \@$startTime{$jobid} +\"SGE array job started %c\n"`;
 +  print WIKI "Used a total of $userTotal{$jobid} CPU seconds ";
 +  print WIKI "over ",$stopTime{$jobid}-$startTime{$jobid}," seconds of elapsed time ";
 +  print WIKI "on ",0+@node," nodes\n";
 +
 +  $baseTime = $startTime{$jobid} if (!defined $baseTime or $startTime{$jobid} < $baseTime);
 +
 +  $avgMaxEig=0;
 +  $count=0;
 +  if ($resultFile) {
 +    my $file = sprintf $resultFile, $jobid;
 +    open(DATA, ">$file");
 +    print DATA "@varNames\n";
 +    foreach $task (sort { $a <=> $b } keys %{$result{$jobid}}) {
 +      my %var = split($;,$result{$jobid}{$task});
 +      print DATA "@var{@varNames}\n";
 +      $avgMaxEig += $var{'maxe'};
 +      $count += 1;
 +    }
 +    close(DATA);
 +    print 'avgMaxEig = ', $avgMaxEig/$count, "\n";
 +  }
 +
 +  printf WIKI "^ %18s ^^ %30s ^^^ %12s ^\n","Node ","Real Clock Time ","Ratio ";
 +  printf WIKI "^ %8s ^ %8s ^ %9s ^ %9s ^ %9s ^ %12s ^\n","Name ","Count ","Min ","Max ","Average ","User/Real ";
 +  foreach (@node) {
 +    if ( $hostCountByJob{$jobid}{$_} > 0) {
 +      printf WIKI "|%8s|%8d| %9.2f|%9.2f|%9.2f |%12.5f|\n", $_, $hostCountByJob{$jobid}{$_},
 +        $hostRealMin{$jobid}{$_},$hostRealMax{$jobid}{$_}, 
 +        $hostReal{$jobid}{$_}/$hostCountByJob{$jobid}{$_}, 
 +        $hostUser{$jobid}{$_}/$hostReal{$jobid}{$_};
 +    }
 +  }
 +  close(WIKI);
 +}
 +
 +if ($countFile and open(DATA,">$countFile")) {
 +  my(@col,%byNode,$time,$count);
 +  $col[$_] = 0 for $[ .. $#node;
 +  foreach $time (sort { $a <=> $b } keys %timeCount) {
 +    printf DATA "%d %s\n", $time-$baseTime, "@col";
 +    $byNode{$_} += $timeCount{$time}{$_} foreach keys %{$timeCount{$time}};
 +    $count=0;
 +    $col[$_] = $count += $byNode{$node[$_]} for $[ .. $#node;
 +    printf DATA "%d %s\n", $time-$baseTime, "@col";
 +  }
 +  close(DATA);
 +}
 +
 +if ($usageFile and open(DATA,">$usageFile")) {
 +  my ($time, $lastTime, $slope, $usage);
 +  foreach $time (sort { $a <=> $b } keys %timeRate) {
 +    $usage += $slope*($time - $lastTime);
 +    $slope += $timeRate{$time}{$_} foreach keys %{$timeRate{$time}};
 +    printf DATA "%d %.4f %.4f\n", $time-$baseTime, $slope, $usage;
 +    $lastTime = $time;
 +  }
 +  close(DATA);
 +}
 +
 +if ($countFile and $usageFile) {
 +  foreach $jobid (keys %startTime) {
 +    my $plotTitle = 'Number of tasks on %s by time (seconds)'; # %s -> nodes
 +    my(@plot);
 +    $plot[$_] = "\"$countFile\" u 1:".(2+$_-$[)." t \"$node[$_]\" w filledcurves"
 +      for  $[ .. $#node;
 +    my $plotTop = join(",",reverse @plot);
 +    my $titleTop = sprintf $plotTitle, 0+@node." nodes";
 +    my $key = "off";
 +    my ($t1,$t2) = (30*int(($startTime{$jobid}-$baseTime)/30),30*int(2+($stopTime{$jobid}-$baseTime)/30));
 +    $titleTop = sprintf $plotTitle, "nodes @node" if $#node < 5;
 +    $key = "out horiz top right" if $#node < 9;
 +  
 +    open (PLOT, "| gnuplot" );
 +    print PLOT <<"EOP";
 +  set term pngcairo font "sans,10" size 640,640
 +  set output "wiki$jobid.png"
 +  set multiplot layout 2,1
 +  set xrange [$t1:$t2]
 +  set key $key
 +  set ylabel "Number of Tasks on node"
 +  plot $plotTop
 +  set key out horiz top right
 +  set ylabel "Total CPU usage"
 +  set xlabel "Time (seconds)"
 +  plot "$usageFile" u 1:3 w lines t "CPU seconds"
 +EOP
 +  }
 +}
 +
 +sub scanfile {
 +   my($file) = @_;
 +   my($jobid,$taskid) = ($file =~ /$pattern/);
 +   my($host,$start,$finish,$usr1,$usr2,$real,$user,$sys,$lhs,%var);
 +   open(FILE,$file) || next;
 +   local $/ = undef;  #Read file as one string
 +   while (<FILE>) {
 +      study;
 +      /^Host (\S+)/m and $host=$1;
 +      /^Start (\d+)/m and $start=$1;
 +      /^Finish (\d+)/m and $finish=$1;
 +      /^SIGUSR1 (\d+)/m and $usr1=$1;
 +      /^SIGUSR2 (\d+)/m and $usr2=$1;
 +      /^real(.*?)m(.*?)s/m and $real=60*$1+$2;
 +      /^user(.*?)m(.*?)s/m and $user=60*$1+$2;
 +      /^sys(.*?)m(.*?)s/m and $sys=60*$1+$2;
 +      while(/(\S+)\s*=\s*(.*)/g) { $var{$1}=$2 };
 +   }
 +   close(FILE);
 +   $result{$jobid}{$taskid} = join($;,%var);
 +   $SGEfile{$file} = sprintf "| %s | %.2f %8.2f %8.2f |", $host, $real, $user, $sys;
 +   $SGEfile{$file} .= sprintf " %d %d |", $usr1, $usr2;
 +   $SGEfile{$file} .= join(',', map {" $_=$var{$_}"} keys %var );
 +   $finish = $usr2 if( $finish==0 );
 +   $finish = $usr1 if( $finish==0 );
 +   $finish > 0 || next;
 +   $real = $finish-$start if($real==0);
 +   $user = $real-$sys if($user==0);
 +   $startTime{$jobid} = $start if (!defined $startTime{$jobid} or $start < $startTime{$jobid});
 +   $stopTime{$jobid} = $finish if (!defined $stopTime{$jobid} or $finish > $stopTime{$jobid});
 +   $userTotal{$jobid} += $user;
 +
 +   $hostCount{$host} += 1;
 +   $hostCountByJob{$jobid}{$host} += 1;
 +   $hostReal{$jobid}{$host} += $real;
 +   $hostRealMax{$jobid}{$host} = $real if (!defined $hostRealMax{$jobid}{$host} or 
 +                                          $real > $hostRealMax{$jobid}{$host});
 +   $hostRealMin{$jobid}{$host} = $real if (!defined $hostRealMin{$jobid}{$host} or 
 +                                          $real < $hostRealMin{$jobid}{$host});
 +   $hostUser{$jobid}{$host} += $user;
 +
 +   $timeCount{$start}{$host} += 1;
 +   $timeCount{$finish}{$host} -= 1;
 +   $timeRate{$start}{$host} += $user/($finish-$start);
 +   $timeRate{$finish}{$host} -= $user/($finish-$start);
 +}
 +
 +sub scandir {
 +   my($basedir) = @_;
 +   my(@file,@dir);
 +
 +   opendir(DIR, $basedir) || return;
 +   foreach ( grep (/^[^\.]/,readdir(DIR)) ) { # ignore hidden files
 +      next if -l "$basedir/$_" ; # skip sym links
 +      push @file,$_ if /$pattern/; # save files with this pattern
 +      push @dir,$_ if -d "$basedir/$_" ; # save directories for recursion
 +   }
 +   closedir(DIR);
 +
 +   foreach (@file) {
 +      &scanfile("$basedir/$_");
 +   }
 +
 +   foreach (@dir) {
 +      &scandir("$basedir/$_");
 +   }
 +}
 +
 +</file>
 +
 +====== Adding checkpoints Matlab job example ======
 +
 +Adding [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpoints]] to your Matlab job could help it to handle kill signals from the system gracefully. Properly handling these signals can help you restart your job without having to start over again. In the following example, we will modify previously used scripts and functions to track which interval the loop stops at when the job times out. 
 +
 +
 +==== Gathering code for job example ====
 +First, we'll create a new directory and copy the needed code into it. 
 +
 +<code>
 +[(it_css:traine)@login01 ~]$ cd matlab_example
 +[(it_css:traine)@login01 matlab_example]$ mkdir matlab_checkpoint
 +[(it_css:traine)@login01 ~]$ cd matlab_checkpoint
 +[(it_css:traine)@login01 matl_checkpoint]$ cp /opt/shared/templates/slurm/generic/serial.qs batch.qs
 +</code>
 +
 +You will also want to put a copy of the [[#matlab-function|maxEig.m]] and [[#matlab-script|script.m]] into your ''matlab_checkpoint'' directory.
 +
 +Now we will need to make changes to ''script.m''. In this example, the script will read the count number for the loop interval. After the job restart, the loop interval will run from the last checkpoint.  
 +<code>
 +% script to run maxEig function 200 times and print average.
 +count = 200;
 +dim = 5001;
 +sumMaxe = 0;
 +i = 0;
 +id = str2num(getenv('SLURM_JOB_ID'));
 +rc = 0;
 +rc = str2num(getenv('SLURM_RESTART_COUNT'));
 +tic;
 +if isempty(rc);
 +   for i=1:count;
 +        sumMaxe = sumMaxe + maxEig(i,dim);
 +        counter = "counter: "+i; %Add this line
 +        disp(counter); %Add this line
 +   end;
 +else
 +   filename = ['checkpoint-', num2str(id), '.out']; % Specify the file name where you want to search
 +   searchString = 'ended on counter'; % Specify the string you want to search for
 +   fileID = fopen(filename, 'r'); % Open the text file for reading
 +        if fileID == -1
 +           error('Unable to open the file.');
 +        end
 +   lineNumber = 0;
 +
 +% Read lines from the file and search for the target string
 +    while ~feof(fileID)
 +        line = fgetl(fileID);
 +        if ischar(line)
 +                lineNumber = lineNumber + 1;
 +                if ~isempty(strfind(line, searchString))
 +                num=regexp(line,'counter:\s(\d+)', 'tokens');
 +                counterNumber = str2double(num{1}{1});% Record the counter number
 +                end
 +        end
 +    end
 +    fclose(fileID); % Close the file
 +    for i =counterNumber:count; % Once the job restarted, it will continue from the last counter number
 +       sumMaxe = sumMaxe + maxEig(i,dim);
 +       counter = "counter: "+i;
 +       disp(counter);
 +    end;
 +end;
 +toc
 +avgMaxEig = sumMaxe/count
 +quit
 +
 +</code>
 +
 +The following changes will need to be added to batch.qs. The option ''--requeue'' allows the job to restart when it fails. You should always set an integer as the maximum restart count. There is no preemption in the ''standard'' partitions, so for the checkpoint job, we use the ''idle'' partition.
 +<code>
 +...
 +40 #SBATCH --job-name=checkpoint
 +...
 +60 #SBATCH --time=0-00:40:00
 +...
 +75 #SBATCH --output=%x-%j.out
 +76 #SBATCH --error=%x-%j.out
 +...
 +85 #SBATCH --mail-user='traine@udel.edu'
 +86 #SBATCH --mail-type=END,FAIL,TIME_LIMIT_90
 +87 #SBATCH --requeue # allow job requeue
 +88 #SBATCH --open-mode=append # the output will append
 +89 #SBATCH --partition=idle
 +...
 +90 max_restarts=1  #only requires a single restart
 +91 scontext=$(scontrol show job $SLURM_JOB_ID)
 +92 restarts=$(echo "$scontext" | grep -o 'Restarts=.' | cut -d= -f2) # get the restart number
 +93 job_exit_handler() {
 +94 counter=$(tail -n 2  ${SLURM_JOB_NAME}-${SLURM_JOB_ID}.out | head -n 1)
 +95   echo "Job ${SLURM_JOB_NAME} ended on ${counter}"
 +96   if [[ $restarts -lt $max_restarts ]];then
 +97        scontrol requeue ${SLURM_JOB_ID} #automatically resubmit the job once
 +98   #matlab -nodisplay -nojvm -batch disp(getReport(err,'extended')); quit;"
 +99   # Copy all our output files back to the original job directory:
 +100   #cp * "$SLURM_SUBMIT_DIR"
 +101
 +102   # Don't call again on EXIT signal, please:
 +103      trap - EXIT
 +104      exit 0
 +105  else
 +106      trap - EXIT
 +107      echo "Your job is over the Maximum restarts limit"
 +108      exit 1
 +109  fi
 +110 }
 +111 
 +112 export UD_JOB_EXIT_FN=job_exit_handler
 +...
 +142 #
 +143 #srun date
 +144 export UD_JOB_EXIT_FN_SIGNALS="SIGTERM EXIT"
 +145 #Loading MATLAB
 +146 vpkg_require matlab/2019b
 +147 #Running the matlab script
 +148 UD_EXEC matlab -nodisplay -nojvm -batch "try; script; catch ERR; disp(job_exit_handler(ERR.getReport)); quit; end"
 +
 +</code>
 +==== Running the checkpoint job and its output ====
 +We know from the MCR example that this script takes between 2-3 hours to run. In the changes we made to ''batch.qs'' script, we set the wall clock to 10 minutes to demonstrate the checkpoint script. That should ensure that this script will fail to complete before the wall clock runs out of time. This is shown in the following job submission example.
 +
 +<code>
 +[(it_css:traine)@login01 matlab_checkpoint]$ sbatch batch.qs
 +Submitted batch job 8382365
 +</code>
 +After the wall clock runs out we will see the following output.
 +<code>
 +[(it_css:traine)@login01 matlab_checkpoint]$ less checkpoint-4610102.out
 +Adding package `matlab/2019b` to your environment
 +-- Registered exit function 'job_exit_handler' for signal(s) SIGTERM EXIT
 +
 +
 +maxe =
 +
 +   70.0220
 +
 +counter: 1
 +
 +maxe =
 +
 +   71.7546
 +
 +counter: 2
 +
 +maxe =
 +
 +   70.8331
 +
 +counter: 3
 +...
 +maxe =
 +
 +   69.3870
 +
 +counter: 11
 +slurmstepd: error: *** JOB 4610102 ON r2l13 CANCELLED AT 2023-10-06T10:47:48 DUE TO TIME LIMIT ***
 +Job 4610102 ended on counter: 11
 +Adding package `matlab/2019b` to your environment
 +-- Registered exit function 'job_exit_handler' for signal(s) SIGTERM EXIT
 +
 +
 +maxe =
 +
 +   69.3870
 +
 +counter: 11
 +
 +maxe =
 +
 +   67.8668
 +
 +counter: 12
 +
 +maxe =
 +
 +   66.8721
 +
 +counter: 13
 +...
 +
 +maxe =
 +
 +   70.4993
 +
 +counter: 19
 +
 +maxe =
 +
 +   70.0726
 +
 +counter: 20
 +
 +maxe =
 +
 +   71.0748
 +
 +counter: 21
 +slurmstepd: error: *** JOB 4610102 ON r1n02 CANCELLED AT 2023-10-06T11:05:18 DUE TO TIME LIMIT ***
 +Job 4610102 ended on counter: 21
 +Your job is over the Maximum restarts limit
 +
 +</code>
 +
 +Now we know that the script first completed about 11 of the 200 loop intervals before the wall clock expired. After the job reaches the time limit, it will automatically requeue the job once and restart at the last loop interval instead of restarting from the beginning. Then it reaches the time limit again and stops at 21 of 200 loops.
 +
 +<note tip>If you don't want to wait the full amount of time that the wall clock is set to you can use the command "scancel" to manually stop the job and trigger the job_exit_handler() function. Then, the job will restart once. As shown for job 4610071,
 +<code>
 +[(it_css:triane)@login01 matlab_checkpoint]$ sbatch batch.qs
 +Submitted batch job 4610071
 +[(it_css:traine)@login01 matlab_checkpoint]$ scancel 4610071
 +[(it_css:traine)@login01 matlab_checkpoint]$ cat checkpoint-4610071.out
 +-- Registered exit function 'job_exit_handler' for signal(s) SIGTERM
 +
 +Adding package `matlab/2019b` to your environment
 +
 + maxe =
 +
 +   70.0220
 +
 +counter: 1
 +
 +maxe =
 +
 +   71.7546
 +
 +counter: 2
 +
 +maxe =
 +
 +   70.8331
 +
 +counter: 3
 +slurmstepd: error: *** JOB 4610071 ON r2l01 CANCELLED AT 2023-10-06T09:39:03 ***
 +Job 4610071 ended on counter: 3
 +Adding package `matlab/2019b` to your environment
 +-- Registered exit function 'job_exit_handler' for signal(s) SIGTERM EXIT
 +
 +
 +maxe =
 +
 +   70.8331
 +
 +counter: 3                   
 + ...
 +</code>
 +</note>