Differences
This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision | ||
software:matlab:caviness [2022-01-10 14:05] – [Scheduling exclusive interactive job] anita | software:matlab:caviness [2023-10-06 10:29] (current) – [Running the checkpoint job and its output] thuachen | ||
---|---|---|---|
Line 1: | Line 1: | ||
+ | ====== Matlab on Caviness ====== | ||
+ | For use on Caviness, MATLAB projects should be developed using a Desktop installation of MATLAB and then copied to Caviness | ||
+ | to be run in batch. | ||
+ | an extended MATLAB example is considered involving one simple MATLAB function, and two MATLAB scripts to execute | ||
+ | this function in a loop, and another to execute in parallel using the Parallel Computing Toolbox. | ||
+ | |||
+ | Details on how to run these two scripts in batch are given with the resulting output files. | ||
+ | section with UNIX commands you can use to watch your jobs and gather [[#timing and core count]] numbers. | ||
+ | It is important to know | ||
+ | how much memory will be needed and how many cores will be used to set your resource requirements. If you do not ask for enough memory your job will fail. If you do not ask for enough cores, the job will take longer. | ||
+ | |||
+ | Even though it easier to develop on a desktop, MATLAB can be run interactively on Caviness, however it is not recommended for scripts that are long and computationally intensive. | ||
+ | Two interactive jobs are demonstrated. | ||
+ | second example shows an interactive session, which starts multiple MATLAB pool of workers to execute the function in a loop using the Parallel Computing toolbox command, **'' | ||
+ | The Parallel Computing toolbox gives a faster time to completion, but more memory and CPU resources are consumed. | ||
+ | |||
+ | You can run [[: | ||
+ | |||
+ | Many MATLAB research projects fall in the "high throughput computing" | ||
+ | Thus we have a | ||
+ | final example that gives the recommended workflow to scale your job to multiple nodes. Compile the MATLAB code with single thread option and deploy the job as an grid engine array job. | ||
+ | |||
+ | <note important> | ||
+ | The MATLAB Distributed Computing Server (MDCS) now referred to as MATLAB Parallel Server is not installed on Caviness. | ||
+ | an array job of compiled MATLAB code is recommended for large jobs. | ||
+ | </ | ||
+ | |||
+ | |||
+ | ====== Getting Started ====== | ||
+ | There will be several examples covered in the following sections. To help make things easier to following it is suggested to make a new directory in your home directory '' | ||
+ | < | ||
+ | [traine@login00 ~]$ mkdir matlab_example | ||
+ | [traine@login00 ~]$ cd matlab_example | ||
+ | </ | ||
+ | <note tip> | ||
+ | |||
+ | As you go through the following example it is suggested that you also create a new directory, for each of them. It will help make it easier to follow and track output files of the different jobs that you will be running. | ||
+ | </ | ||
+ | |||
+ | Now create the following file and put it in the '' | ||
+ | ===== Matlab function ===== | ||
+ | |||
+ | We will be using this sample function on the Caviness cluster in multiple demonstrations. | ||
+ | <file matlab maxEig.m> | ||
+ | function maxe = maxEig(sd, | ||
+ | % maxEig | ||
+ | % Input parameters | ||
+ | % sd - seed for random generator | ||
+ | % dim - size of the square matrix | ||
+ | % | ||
+ | % maxe - maximum real eigenvalue | ||
+ | if (isdeployed) | ||
+ | sd = str2num(sd) | ||
+ | dim = str2num(dim) | ||
+ | end | ||
+ | |||
+ | rng(sd); | ||
+ | ev = eig( randn(dim) ); | ||
+ | maxe = max( ev(imag(ev)==0) ) | ||
+ | end | ||
+ | </ | ||
+ | |||
+ | |||
+ | The remainder of this page is based on using this MATLAB function to illustrate using MATLAB interactively and batch. | ||
+ | |||
+ | Finally it will be compiled and deployed using the MATLAB Compiler Runtime (MCR) environment. | ||
+ | |||
+ | <note important> | ||
+ | We want to select on the real eigenvalues to compute the maximum. | ||
+ | </ | ||
+ | |||
+ | <note tip>The last line of this function does not have a semicolon. | ||
+ | <code matlab> | ||
+ | maxe = max( ev(imag(ev)==0) ); | ||
+ | fprintf(' | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | |||
+ | ==== Matlab script ==== | ||
+ | Now, write a MATLAB script file and put it in the '' | ||
+ | |||
+ | <file matlab script.m> | ||
+ | % script to run maxEig function 200 times and print average. | ||
+ | |||
+ | count = 200; | ||
+ | dim = 5001; | ||
+ | sumMaxe = 0; | ||
+ | tic; | ||
+ | for i=1:count; | ||
+ | sumMaxe = sumMaxe + maxEig(i, | ||
+ | end; | ||
+ | toc | ||
+ | avgMaxEig = sumMaxe/ | ||
+ | |||
+ | quit | ||
+ | </ | ||
+ | |||
+ | This is a detailed script example, which calls the '' | ||
+ | |||
+ | <note tip> | ||
+ | This script ends in a **__quit__** command (equivalent to MATLAB **__exit__**). | ||
+ | terminates MATLAB when done. If you run this from the bash command line (interactively) with the '' | ||
+ | |||
+ | Without the **__quit__** you will come back to the MATLAB prompt on completion for a interactive job. If this is the last line of a batch queue script, then the only difference will be the MATLAB prompt ''>>'' | ||
+ | </ | ||
+ | ===== Copy the project folder ===== | ||
+ | |||
+ | If you created the files on your desktop version of MATLAB, now copy the folder to your '' | ||
+ | Use any [[: | ||
+ | ====== Batch Job ====== | ||
+ | You should have a copy of your MATLAB [[#project directory]] on the cluster. | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | MATLAB has a new version twice a year. It is important to keep the version you use on your desktop the same as the | ||
+ | one on the cluster. | ||
+ | < | ||
+ | vpkg_versions matlab | ||
+ | </ | ||
+ | will show you the versions available on a cluster. | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | |||
+ | It is frequently advisable to keep your MATLAB project clean from non-MATLAB files such as the job | ||
+ | script file and the script output file. But you may combine them, and even use the MATLAB editor to | ||
+ | create the script file and look at the output file. | ||
+ | If you create the file on a Windows desktop, take care to not transfer the files as binary. See [[abstract: | ||
+ | |||
+ | When you have one combined directory, do not put the '' | ||
+ | to the project directory using '' | ||
+ | </ | ||
+ | ===== Create a job script file ===== | ||
+ | You should create a job script file to submit a batch job. Start by modifying a batch job script template file (''/ | ||
+ | In your newly copied serial.qs file, add the following lines at the end. | ||
+ | < | ||
+ | [traine@login00 matlab_example]$ cp / | ||
+ | </ | ||
+ | < | ||
+ | # Add vpkg_require commands after this line: | ||
+ | vpkg_require matlab | ||
+ | #Running the Matlab main_script | ||
+ | matlab -nodisplay -singleCompThread -batch main_script | ||
+ | </ | ||
+ | Note we did not specify a version of MATLAB with the VALET command, so we will get the default version ('' | ||
+ | < | ||
+ | display 'Hello World' | ||
+ | </ | ||
+ | When this script runs it will display '' | ||
+ | |||
+ | ===== Submit batch job ===== | ||
+ | Your shell must be in a [[abstract: | ||
+ | to submit any jobs. | ||
+ | Use the '' | ||
+ | and note the ''<< | ||
+ | submit the job you would type | ||
+ | < | ||
+ | sbatch matlab_first.qs | ||
+ | </ | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | This is the message you get if you are not in a workgroup. | ||
+ | |||
+ | | ||
+ | </ | ||
+ | |||
+ | <note warning> | ||
+ | |||
+ | **Bash script vs job script** | ||
+ | |||
+ | It is true that a job script file is (usually) a bash script, but it must be executed with the '' | ||
+ | </ | ||
+ | ===== Wait for job to complete ===== | ||
+ | You can [[abstract: | ||
+ | For example, to list the information for job ''<< | ||
+ | < | ||
+ | scontrol show job << | ||
+ | </ | ||
+ | |||
+ | For long running jobs, you could change your job script to notify you via an email message when the job is | ||
+ | complete. | ||
+ | |||
+ | |||
+ | ===== Post process job ===== | ||
+ | All MATLAB output data files will be in the project directory, but the MATLAB standard output will be in | ||
+ | the current directory, from which you submitted the job. If you did not redefine Slurm output for your job, then you'll be looking for a file '' | ||
+ | |||
+ | ====== Interactive job ====== | ||
+ | |||
+ | Here are specific details for running MATLAB as an interactive job on a compute node. You should have a copy of your [[#MATLAB project directory]] on the cluster and will be referred to a '' | ||
+ | |||
+ | ===== Command-line ===== | ||
+ | |||
+ | You should work on a compute node when in command-line MATLAB. | ||
+ | Your shell must be in a [[abstract: | ||
+ | to submit a single threaded interactive job using '' | ||
+ | |||
+ | < | ||
+ | |||
+ | [traine@login00 ~]$ workgroup -g it_css | ||
+ | [(it_css: | ||
+ | salloc: Pending job allocation 7809686 | ||
+ | salloc: job 7809686 queued and waiting for resources | ||
+ | salloc: job 7809686 has been allocated resources | ||
+ | salloc: Granted job allocation 7809686 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n16 are ready for job | ||
+ | [traine@r00n16 ~]$ vpkg_require matlab | ||
+ | Adding package `matlab/ | ||
+ | [traine@r00n16 ~]$ cd matlab_example | ||
+ | [traine@r00n16 matlab_example]$ matlab -nodesktop -singleCompThread | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | | ||
+ | R2018b (9.5.0.944444) 64-bit (glnxa64) | ||
+ | August 28, 2018 | ||
+ | |||
+ | |||
+ | To get started, type doc. | ||
+ | For product information, | ||
+ | >> | ||
+ | </ | ||
+ | |||
+ | This will start a interactive command-line session in your terminal window. | ||
+ | < | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | Copyright 1984-2018 The MathWorks, Inc. | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | To get started, type doc. | ||
+ | For product information, | ||
+ | >> | ||
+ | [traine@r00n16 matlab_example]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 7809686 | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | |||
+ | ===== Desktop ===== | ||
+ | You should be on a compute node before you start MATLAB. To start a MATLAB desktop (GUI mode) on a cluster, you must be running an X11 server and you must have connected to the cluster with '' | ||
+ | |||
+ | You must be in a workgroup environment to submit a job using '' | ||
+ | |||
+ | <code bash> | ||
+ | [traine@login00 ~]$ workgroup -g it_css | ||
+ | [(it_css: | ||
+ | salloc: Pending job allocation 7790913 | ||
+ | salloc: job 7790913 queued and waiting for resources | ||
+ | salloc: job 7790913 has been allocated resources | ||
+ | salloc: Granted job allocation 7790913 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n10 are ready for job | ||
+ | [traine@r00n10 ~]$ vpkg_require matlab | ||
+ | Adding package `matlab/ | ||
+ | [traine@r00n10 ~]$ matlab | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | </ | ||
+ | |||
+ | This will start an interactive MATLAB desktop GUI mode session on your desktop in an X11 window using your workgroup resources. | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | When done type the '' | ||
+ | |||
+ | See [[software: | ||
+ | |||
+ | For more information, | ||
+ | |||
+ | For more information on GUI Applications on Caviness, visit [[abstract: | ||
+ | |||
+ | ====== Compiling with Matlab ====== | ||
+ | |||
+ | We show the three most common ways to work with compilers when using MATLAB. | ||
+ | |||
+ | - Compiling your Matlab code to run in the MCR (MATLAB Compiler Runtime) | ||
+ | - Compiling your C or Fortran program to call MATLAB engine. | ||
+ | - Compiling your own function in C or Fortran to be used in a MATLAB session. | ||
+ | |||
+ | < | ||
+ | < | ||
+ | Warning: You are using gcc version ' | ||
+ | with MEX is ' | ||
+ | http:// | ||
+ | </ | ||
+ | But the compilation completes successfully. | ||
+ | </ | ||
+ | ===== Compiling your Matlab code ===== | ||
+ | |||
+ | There is an example MCR project in the ''/ | ||
+ | |||
+ | ==== Copy dev-projects template ==== | ||
+ | |||
+ | On the head node, copy the example project into your current directory using the following commands | ||
+ | < | ||
+ | [traine@login00 ~]$ workgroup -g it_css | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Compile with make ==== | ||
+ | |||
+ | Now compile on the compute node by using | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Granted job allocation 7861739 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n56 are ready for job | ||
+ | [triane@r00n56 MCR]$ | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | |||
+ | Check and edit the VALET command in the '' | ||
+ | < | ||
+ | [traine@r00n56 MCR]$ make | ||
+ | Adding package `mcr/ | ||
+ | make[1]: Entering directory `/ | ||
+ | mcc -o maxEig -I ./common -R "" | ||
+ | Compiler version: 7.1 (R2019b) | ||
+ | Dependency analysis by REQUIREMENTS. | ||
+ | Parsing file "/ | ||
+ | (referenced from command line). | ||
+ | Generating file "/ | ||
+ | Generating file " | ||
+ | make[1]: Leaving directory `/ | ||
+ | </ | ||
+ | Take note of the package added, and the files that are generated. | ||
+ | Remember the VALET command used to load the appropriate version of the '' | ||
+ | ==== Test interactively ==== | ||
+ | |||
+ | To test interactively on the same compute node. | ||
+ | < | ||
+ | [traine@r00n56 MCR]$ vpkg_require mcr/ | ||
+ | Adding package `mcr/ | ||
+ | [traine@r00n56 MCR]$ time ./maxEig 20.8 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | 510.8787 | ||
+ | |||
+ | |||
+ | real 6m58.608s | ||
+ | user 6m38.486s | ||
+ | sys | ||
+ | </ | ||
+ | |||
+ | <note tip>This example is designed as a test for batch computing, and takes between 5-15 minutes to complete. If you | ||
+ | change the MATLAB statement dim=10000 to dim=1000, and recompile, it will take about 10 seconds</ | ||
+ | === back to the head node === | ||
+ | When done, type '' | ||
+ | < | ||
+ | [traine@r00n56 MCR]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 7861739 | ||
+ | [(it_css: | ||
+ | </ | ||
+ | ==== Test batch ==== | ||
+ | === Copy array job example === | ||
+ | On the head node, copy the MCR array example project and the '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | Adding package `mcr/2019b` to your environment | ||
+ | make[1]: Entering directory `/ | ||
+ | mcc -o maxEig -I ./common -R "" | ||
+ | Compiler version: 7.1 (R2019b) | ||
+ | Dependency analysis by REQUIREMENTS. | ||
+ | Parsing file "/ | ||
+ | (referenced from command line). | ||
+ | Generating file "/ | ||
+ | Generating file " | ||
+ | make[1]: Leaving directory `/ | ||
+ | </ | ||
+ | |||
+ | The following lines will need to be changed or added to the '' | ||
+ | |||
+ | < | ||
+ | ... | ||
+ | 36 #SBATCH --mem=3G | ||
+ | ... | ||
+ | 54 #SBATCH --job-name=matlab_mcr_arrray | ||
+ | ... | ||
+ | 65 #SBATCH --partition=_workgroup_ | ||
+ | ... | ||
+ | 92 #SBATCH --output arrayJob-%A-%3a.out | ||
+ | ... | ||
+ | 117 # Setting the job array options | ||
+ | 118 #SBATCH --array=1-100: | ||
+ | ... | ||
+ | 157 # Load a specific Matlab MCR package into the runtime environment: | ||
+ | 158 # | ||
+ | 159 vpkg_require mcr/ | ||
+ | 160 | ||
+ | 161 # | ||
+ | 162 # Do standard MCR environment setup: | ||
+ | 163 # | ||
+ | 164 . / | ||
+ | 165 | ||
+ | 166 # | ||
+ | 167 # Execute your MCR program(s) here; prefix with UD_EXEC to | ||
+ | 168 # ensure the job can/will respond to preemption/ | ||
+ | 169 # signals by calling your UD_JOB_EXIT_FN. | ||
+ | 170 # | ||
+ | 171 # Duplicate all three commands for each MCR program you run | ||
+ | 172 # in sequence below. | ||
+ | 173 # | ||
+ | 174 #UD_EXEC my_mcr_program arg1 arg2 | ||
+ | 175 #mcr_rc=$? | ||
+ | 176 #if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi | ||
+ | 177 | ||
+ | 178 echo "Job Running on Host: $HOSTNAME" | ||
+ | 179 | ||
+ | 180 start=$(date " | ||
+ | 181 echo "Job Start: ${start}" | ||
+ | 182 | ||
+ | 183 #Using the Slurm task ID as an argurment lambda to MaxEig | ||
+ | 184 let lambda=$SLURM_ARRAY_TASK_ID | ||
+ | 185 | ||
+ | 186 #Lines Added for MCR_array example | ||
+ | 187 UD_EXEC ${HOME}/ | ||
+ | 188 mcr_rc=$? | ||
+ | 189 if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi | ||
+ | 190 | ||
+ | 191 finish=$(date " | ||
+ | 192 echo "Job Finish: ${finish}" | ||
+ | 193 | ||
+ | 194 runtime=$(($finish-$start)) | ||
+ | 195 | ||
+ | 196 echo "Total Runtime: ${runtime}" | ||
+ | </ | ||
+ | |||
+ | Example '' | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | Submitted batch job 9803575 | ||
+ | [(it_css: | ||
+ | Fri Oct 30 08:57:58 EDT 2020 | ||
+ | [(it_css: | ||
+ | Fri Oct 30 08:58:19 EDT 2020 | ||
+ | [(it_css: | ||
+ | 100 | ||
+ | </ | ||
+ | |||
+ | There are 100 output files with the names '' | ||
+ | For example, file 50 which is '' | ||
+ | |||
+ | < | ||
+ | Adding package `mcr/2019b` to your environment | ||
+ | -- Matlab MCR environment setup complete (on r00n13): | ||
+ | -- MCR_ROOT | ||
+ | -- MCR_CACHE_ROOT | ||
+ | |||
+ | Job Running on Host: r00n13.localdomain.hpc.udel.edu | ||
+ | Job Start: 1604062673 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | 525.9320 | ||
+ | |||
+ | Job Finish: 1604062704 | ||
+ | Total Runtime: 31 | ||
+ | </ | ||
+ | |||
+ | |||
+ | [[more examples]] **//Under construction: | ||
+ | |||
+ | ===== Compiling your code to use MATLAB engine ====== | ||
+ | |||
+ | Here is an simple example function called '' | ||
+ | |||
+ | On the head node and in your workgroup shell: | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | Warning: MATLAB FORTRAN MEX Files are now defaulting to -largeArrayDims and 8 byte integers. | ||
+ | If you are building a FORTRAN S-Function, please recompile using the -compatibleArrayDims flag. | ||
+ | You can find more about adapting code to use 64-bit array dimensions at: | ||
+ | | ||
+ | Building with ' | ||
+ | MEX completed successfully. | ||
+ | [(it_css: | ||
+ | </ | ||
+ | |||
+ | To run this program it will require running an interactive session on a compute node with X11 forwarding enabled. | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Granted job allocation 7915683 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r03g07 are ready for job | ||
+ | [traine@r03g07 matlab_compile]$ vpkg_require matlab/ | ||
+ | Adding package `matlab/ | ||
+ | Adding package `gcc/9.1.0` to your environment | ||
+ | [traine@r03g07 matlab_compile]$ export LD_LIBRARY_PATH=$MATLABROOT/ | ||
+ | [traine@r03g07 matlab_compile]$ ./fengdemo | ||
+ | </ | ||
+ | |||
+ | Shortly after starting to run the the program, '' | ||
+ | |||
+ | {{ : | ||
+ | |||
+ | After the Matlab window is opened, you will see a prompt in the terminal to " | ||
+ | |||
+ | After the table is returned, close the MATLAB window with the Chart. Then use the '' | ||
+ | |||
+ | < | ||
+ | Type 0 < | ||
+ | Type 1 < | ||
+ | 1 | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | [traine@r03g07 matlab_compile]$ exit | ||
+ | salloc: Relinquishing job allocation 7915683 | ||
+ | [(it_css: | ||
+ | </ | ||
+ | |||
+ | ===== Compiling your own MATLAB function ====== | ||
+ | |||
+ | There is an simple example function '' | ||
+ | |||
+ | On the head node and in a workgroup shell: | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | Adding package `matlab/ | ||
+ | Adding package `gcc/9.1.0` to your environment | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | Building with ' | ||
+ | Warning: You are using gcc version ' | ||
+ | MEX completed successfully. | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | |||
+ | To start MATLAB on a compute node to test this new function: | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Pending job allocation 7916296 | ||
+ | salloc: job 7916296 queued and waiting for resources | ||
+ | salloc: job 7916296 has been allocated resources | ||
+ | salloc: Granted job allocation 7916296 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n56 are ready for job | ||
+ | [traine@r00n56 matlab_function]$ vpkg_require matlab/ | ||
+ | [traine@r00n56 matlab_function]$ matlab -nodesktop | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | Copyright 1984-2019 The MathWorks, Inc. | ||
+ | R2019a (9.6.0.1072779) 64-bit (glnxa64) | ||
+ | March 8, 2019 | ||
+ | |||
+ | To get started, type doc. | ||
+ | For product information, | ||
+ | |||
+ | >> | ||
+ | </ | ||
+ | |||
+ | Now test the function by typing '' | ||
+ | |||
+ | < | ||
+ | >> timestwo(4) | ||
+ | |||
+ | ans = | ||
+ | |||
+ | 8 | ||
+ | |||
+ | >> quit | ||
+ | [traine@r00n56 matlab_function]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 7916296 | ||
+ | [(it_css: | ||
+ | </ | ||
+ | ====== Batch job serial example ====== | ||
+ | |||
+ | Second, write a shell script file to set the MATLAB environment and start MATLAB running your script file. The following script file will set the MATLAB environment and run the command in the [[# | ||
+ | |||
+ | < | ||
+ | '' | ||
+ | </ | ||
+ | |||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | </ | ||
+ | <file bash batch.qs> | ||
+ | ... | ||
+ | 40 #SBATCH --job-name=script.m | ||
+ | ... | ||
+ | 50 #SBATCH --partition=_workgroup_ | ||
+ | ... | ||
+ | 67 #SBATCH --time=0-03: | ||
+ | ... | ||
+ | 76 #SBATCH --output %x-%j.out | ||
+ | 77 #SBATCH --error %x-%j.out | ||
+ | ... | ||
+ | 86 #SBATCH --mail-user=' | ||
+ | 87 #SBATCH --mail-type=END, | ||
+ | ... | ||
+ | 137 # | ||
+ | 138 # [EDIT] Add your script statements hereafter, or execute a script or program | ||
+ | 139 # using the srun command. | ||
+ | 140 # | ||
+ | 141 #srun date | ||
+ | 142 #Loading MATLAB | ||
+ | 143 vpkg_require matlab/ | ||
+ | 144 #Running the matlab script | ||
+ | 145 matlab -nodisplay -nojvm -batch script | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | Make sure you change the '' | ||
+ | The '' | ||
+ | |||
+ | <note tip> | ||
+ | The command '' | ||
+ | the compound command | ||
+ | < | ||
+ | "try; script; catch ERR; disp(getReport(ERR,' | ||
+ | </ | ||
+ | The purpose of the **'' | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | |||
+ | * Do not include the '' | ||
+ | * Do set paper dimensions and print each figure to a file. | ||
+ | |||
+ | The text output will be included in the standard Slurm output file, but not any graphics. | ||
+ | |||
+ | We suggest setting the current figure' | ||
+ | |||
+ | <code matlab> | ||
+ | set(gcf,' | ||
+ | print(' | ||
+ | </ | ||
+ | |||
+ | will set the current figure to be 4 x 3 inches with no margins, and then print the figure as a 400x300 resolution '' | ||
+ | </ | ||
+ | |||
+ | ==== Submit job ==== | ||
+ | Third, from the directory with '' | ||
+ | |||
+ | < | ||
+ | sbatch batch.qs | ||
+ | </ | ||
+ | ==== Wait for completion ==== | ||
+ | Finally, wait for the mail notification, | ||
+ | |||
+ | After waiting for about 2 or 3 hours, a message was received from SLURM Administrator. The email will have a title like the one shown below and there will be no content in the body. | ||
+ | < | ||
+ | SLURM Job_id=7937771 Name=script.m Ended, Run time 02:47:11, COMPLETED, ExitCode 0 | ||
+ | </ | ||
+ | |||
+ | ==== Gather results ==== | ||
+ | The results for Job 7937771 are in the file | ||
+ | <file text script.m-7937771.out > | ||
+ | Fri Apr 10 16:36:38 EDT 2020 | ||
+ | Adding package `matlab/ | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | Copyright 1984-2018 The MathWorks, Inc. | ||
+ | | ||
+ | August 28, 2018 | ||
+ | |||
+ | |||
+ | For online documentation, | ||
+ | For product information, | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | 67.4221 | ||
+ | Elapsed time is 10023.165546 seconds. | ||
+ | |||
+ | avgMaxEig = | ||
+ | |||
+ | 69.5131 | ||
+ | |||
+ | </ | ||
+ | ==== Timings and core count ==== | ||
+ | |||
+ | Consider a batch job run with these Slurm options: | ||
+ | < | ||
+ | |||
+ | # | ||
+ | # | ||
+ | # | ||
+ | </ | ||
+ | |||
+ | The '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | r01n17 | ||
+ | [(it_css: | ||
+ | PID RUSER %CPU %MEM THCNT STIME TIME COMMAND | ||
+ | 10853 traine | ||
+ | </ | ||
+ | This '' | ||
+ | |||
+ | Given the reported PID '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | UID PID PPID | ||
+ | traine | ||
+ | </ | ||
+ | |||
+ | While the batch job was running on node '' | ||
+ | every second '' | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND | ||
+ | 10906 traine | ||
+ | </ | ||
+ | |||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | HOSTNAME | ||
+ | ---------------------------------------------------------------------------------------------- | ||
+ | r01n17 | ||
+ | </ | ||
+ | |||
+ | After the job is done you can use '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | | ||
+ | ---------- ------------ --------------- ---------- ---------- ---------- ------------------- ------------------- ---------- ---------- | ||
+ | script_op+ 7935464 | ||
+ | batch 7935464.bat+ | ||
+ | extern 7935464.ext+ | ||
+ | date 7935464.0 | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ====== Batch job parallel example ====== | ||
+ | |||
+ | The MATLAB Parallel Computing toolbox uses JVM to manage the workers and communicate while you are running. | ||
+ | need to setup the MATLAB pools in your '' | ||
+ | |||
+ | ==== Matlab parallel script ==== | ||
+ | Here is the slightly modified MATLAB script. | ||
+ | |||
+ | Add the necessary commands to configure your '' | ||
+ | <file text pscript.m> | ||
+ | % script to run maxEig function 200 times | ||
+ | %% Configure parpool | ||
+ | myCluster = parcluster(' | ||
+ | myCluster.NumWorkers = str2double(getenv(' | ||
+ | myCluster.JobStorageLocation = getenv(' | ||
+ | myPool = parpool(myCluster, | ||
+ | |||
+ | count = 200; | ||
+ | dim = 5001; | ||
+ | sumMaxe = 0; | ||
+ | |||
+ | tic | ||
+ | parfor i=1:count; | ||
+ | sumMaxe = sumMaxe + maxEig(i, | ||
+ | end | ||
+ | toc | ||
+ | avgMaxEig = sumMaxe/ | ||
+ | |||
+ | delete(myPool); | ||
+ | exit | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Slurm parallel script ==== | ||
+ | Remove the option '' | ||
+ | Copy the template '' | ||
+ | < | ||
+ | cp / | ||
+ | </ | ||
+ | Make the following changes to the code | ||
+ | <file bash pbatch.qs> | ||
+ | ... | ||
+ | 19 #SBATCH --ntasks=20 | ||
+ | ... | ||
+ | 37 #SBATCH --mem=60G | ||
+ | ... | ||
+ | 54 #SBATCH --job-name=matlab-pscript | ||
+ | ... | ||
+ | 65 #SBATCH --partition=_workgroup_ | ||
+ | ... | ||
+ | 75 #SBATCH --time=0-01: | ||
+ | ... | ||
+ | 82 #SBATCH --time-min=0-00: | ||
+ | ... | ||
+ | 90 #SBATCH --output=%x-%j.out | ||
+ | 91 #SBATCH --error=%x-%j.out | ||
+ | ... | ||
+ | 155 vpkg_require matlab/ | ||
+ | ... | ||
+ | 170 UD_EXEC matlab -nodisplay -batch pscript | ||
+ | |||
+ | </ | ||
+ | |||
+ | ==== Timing results ==== | ||
+ | Reported usage for same job run using the parallel toolbox. | ||
+ | < | ||
+ | [(it_css: | ||
+ | | ||
+ | ---------- ------------ --------------- ---------- ---------- ---------- ------------------- ------------------- ---------- ---------- | ||
+ | script.m 7994763 | ||
+ | batch 7994763.bat+ | ||
+ | extern 7994763.ext+ | ||
+ | | ||
+ | batch 7997035.bat+ | ||
+ | extern 7997035.ext+ | ||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | Compare script vs pscript | ||
+ | |||
+ | ^ Job ^ Elapsed Time ^ CPUTime ^ Max RSS^ | ||
+ | | script.m | 02:48:22| 4-05:01:12 | 804888K | | ||
+ | | pscript | 00:19:35 | 09:33:20 | 14462792K | | ||
+ | |||
+ | The job **script** used more CPU resources with the multiple computational threads, while **pscript** user more memory resources with 20 single-threaded worker. | ||
+ | |||
+ | |||
+ | |||
+ | ====== Interactive job example ====== | ||
+ | |||
+ | The basic steps to running a [[: | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Scheduling interactive job ==== | ||
+ | Create a directory and add [[# | ||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | |||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | maxEig.m | ||
+ | </ | ||
+ | Start an interactive session on a compute node with the '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Pending job allocation 7985695 | ||
+ | salloc: job 7985695 queued and waiting for resources | ||
+ | salloc: job 7985695 has been allocated resources | ||
+ | salloc: Granted job allocation 7985695 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r01n10 are ready for job | ||
+ | [traine@r01n10 matlab_interact]$ | ||
+ | </ | ||
+ | |||
+ | ==== Starting a command mode matlab session ==== | ||
+ | < | ||
+ | [traine@r01n10 matlab_interact]$ vpkg_require matlab/ | ||
+ | Adding package `matlab/ | ||
+ | [traine@r01n10 matlab_interact]$ | ||
+ | </ | ||
+ | < | ||
+ | [traine@r01n10 matlab_interact]$ matlab -nodesktop -nosplash | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | To get started, type doc. | ||
+ | For product information, | ||
+ | |||
+ | >> | ||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | |||
+ | |||
+ | ==== Using help as the first command ==== | ||
+ | |||
+ | < | ||
+ | >> help maxEig | ||
+ | | ||
+ | Input parameters | ||
+ | sd - seed for uniform random generator | ||
+ | dim - size of the square matrix (should be odd) | ||
+ | Output value | ||
+ | maxe - maximum real eigvalue | ||
+ | | ||
+ | </ | ||
+ | |||
+ | ==== Calling function once ==== | ||
+ | |||
+ | Use the tic and toc commands to report the elapsed time to generate the random matrix, find all eigenvalues and report the maximum real eigenvalue. | ||
+ | |||
+ | < | ||
+ | >> tic; maxEig(1, | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | Elapsed time is 54.781289 seconds. | ||
+ | </ | ||
+ | |||
+ | ==== Finishing up ==== | ||
+ | |||
+ | < | ||
+ | >> exit | ||
+ | [traine@r01n10 matlab_interact]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 7985695 | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | ===== Interactive parallel toolbox example ===== | ||
+ | This example is based on the '' | ||
+ | |||
+ | When you using the parallel toolbox, you should logon to a compute node using a workgroup partition and the number of tasks and memory required: | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Pending job allocation 7993736 | ||
+ | salloc: job 7993736 queued and waiting for resources | ||
+ | salloc: job 7993736 has been allocated resources | ||
+ | salloc: Granted job allocation 7985815 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00g01 are ready for job | ||
+ | [traine@r00g01 matlab_interact]$ vpkg_require matlab/ | ||
+ | [traine@r00g01 matlab_interact]$ matlab -nodesktop -nosplash | ||
+ | </ | ||
+ | | ||
+ | This will effectively reserve 20 cpus and 40G of memory for your interactive job. The default number of parallel workers when using the parallel toolbox is 12 but you can define the number workers based on the number of tasks requested. | ||
+ | |||
+ | Here we request 20 workers with the '' | ||
+ | |||
+ | <note important> | ||
+ | |||
+ | It took about 100 seconds for all 20 workers to produce a result, however since there are 20 workers working in parallel the elapsed time to complete 200 results is about 918 seconds. | ||
+ | |||
+ | < | ||
+ | MATLAB is selecting SOFTWARE OPENGL rendering. | ||
+ | |||
+ | < M A T L A B (R) > | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | |||
+ | To get started, type doc. | ||
+ | For product information, | ||
+ | |||
+ | >> myCluster = parcluster(' | ||
+ | >> myCluster.NumWorkers = str2double(getenv(' | ||
+ | >> myCluster.JobStorageLocation = getenv(' | ||
+ | >> myPool = parpool(myCluster, | ||
+ | Starting parallel pool (parpool) using the ' | ||
+ | Connected to the parallel pool (number of workers: 20). | ||
+ | >> | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | ... skipped lines ... | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | Elapsed time is 918.822702 seconds. | ||
+ | |||
+ | </ | ||
+ | |||
+ | Once the job is completed, delete your pool and exit MATLAB, and release the interactive compute node by typing '' | ||
+ | |||
+ | < | ||
+ | >> delete(myPool); | ||
+ | Parallel pool using the ' | ||
+ | >> exit | ||
+ | [traine@r00g01 matlab_interact]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 7993736 | ||
+ | [(it_css: | ||
+ | </ | ||
+ | ====== MCR array job example ====== | ||
+ | |||
+ | Most Matlab functions can be compiled using the Matlab Compiler ('' | ||
+ | |||
+ | There are two ways to run compiled MATLAB jobs in a shared environment, | ||
+ | - Compile to produce an executable that uses a single computational thread specifying the MATLAB option '' | ||
+ | - Submit the job to use the nodes exclusively specifying the Slurm option '' | ||
+ | |||
+ | You can run more jobs on each node when they are compiled using just one core (Single Computational Thread). | ||
+ | |||
+ | |||
+ | ==== Example compiler commands ==== | ||
+ | |||
+ | Make a new directory '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | </ | ||
+ | The [[# | ||
+ | < | ||
+ | if (isdeployed) | ||
+ | sd = str2num(sd) | ||
+ | dim = str2num(dim) | ||
+ | end | ||
+ | </ | ||
+ | All arguments of the function are taken as tokens on the shell command used to execute the script, | ||
+ | and they are all strings. | ||
+ | that the rest of the script will behave the same when deployed or executed directly in Matlab. | ||
+ | |||
+ | You can convert this function into a single computational executable by using the Matlab compiler '' | ||
+ | < | ||
+ | prog=maxEig | ||
+ | opt=' | ||
+ | version=' | ||
+ | |||
+ | vpkg_require matlab/ | ||
+ | mcc -R " | ||
+ | |||
+ | [ -d ${WORKDIR}/ | ||
+ | </ | ||
+ | |||
+ | <note tip> | ||
+ | |||
+ | <note tip> | ||
+ | options you want to use at run time. The '' | ||
+ | using MCR. The '' | ||
+ | |||
+ | <note warning> | ||
+ | < | ||
+ | [ -d $WORKDIR/ | ||
+ | </ | ||
+ | </ | ||
+ | |||
+ | ==== Compiling commands ==== | ||
+ | |||
+ | Make the directory where the MaxEig function will be placed when the function is compiled. | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | </ | ||
+ | <note important> | ||
+ | If you have a permission error, check to make sure that you are in your workgroup. | ||
+ | </ | ||
+ | Now request a interactive compute node and run the '' | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Granted job allocation 9804138 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n56 are ready for job | ||
+ | [traine@r00n56 MCR_array_II]$ ls | ||
+ | compile.sh | ||
+ | [traine@r00n56 MCR_array_II]$ . compile.sh | ||
+ | Adding package `matlab/ | ||
+ | Compiler version: 7.1 (R2019b) | ||
+ | Dependency analysis by REQUIREMENTS. | ||
+ | Parsing file "/ | ||
+ | (referenced from command line). | ||
+ | Generating file "/ | ||
+ | Generating file " | ||
+ | [traine@r00n56 MCR_array_II]$ ls | ||
+ | compile.sh | ||
+ | [traine@r00n56 MCR_array_II]$ exit | ||
+ | exit | ||
+ | salloc: Relinquishing job allocation 9804138 | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | |||
+ | |||
+ | |||
+ | ==== Example queue script file ==== | ||
+ | |||
+ | The '' | ||
+ | ''/ | ||
+ | < | ||
+ | [(it_css: | ||
+ | |||
+ | </ | ||
+ | and make the appropriate changes below changes. | ||
+ | < | ||
+ | ... | ||
+ | 20 #SBATCH --ntasks=2 | ||
+ | ... | ||
+ | 29 #SBATCH --mem=3G | ||
+ | ... | ||
+ | 47 #SBATCH --job-name=maxEig | ||
+ | ... | ||
+ | 58 #SBATCH --partition=_workgroup_ | ||
+ | ... | ||
+ | 85 #SBATCH --output %x-%A-%3a.out | ||
+ | ... | ||
+ | 102 # Setting the job array options | ||
+ | 103 #SBATCH --array=1-200: | ||
+ | ... | ||
+ | 148 # Load a specific Matlab MCR package into the runtime environment: | ||
+ | 149 # | ||
+ | 150 vpkg_require mcr/ | ||
+ | 151 export MCR_CACHE_ROOT=" | ||
+ | 152 | ||
+ | 153 # | ||
+ | 154 # Do standard MCR environment setup: | ||
+ | 155 # | ||
+ | 156 . / | ||
+ | 157 | ||
+ | 158 # | ||
+ | 159 date " | ||
+ | 160 echo "Host ${HOSTNAME}" | ||
+ | 161 | ||
+ | 162 | ||
+ | 163 #Getting the ask ID that will be passed as a argument | ||
+ | 164 let seed=$SLURM_ARRAY_TASK_ID | ||
+ | 165 let dim=5001 | ||
+ | 166 | ||
+ | 167 # Execute your MCR program(s) here; prefix with UD_EXEC to | ||
+ | 168 # ensure the job can/will respond to preemption/ | ||
+ | 169 # signals by calling your UD_JOB_EXIT_FN. | ||
+ | 170 # | ||
+ | 171 # Duplicate all three commands for each MCR program you run | ||
+ | 172 # in sequence below. | ||
+ | 173 # | ||
+ | 174 #UD_EXEC my_mcr_program arg1 arg2 | ||
+ | 175 #mcr_rc=$? | ||
+ | 176 #if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi | ||
+ | 177 UD_EXEC ${WORKDIR}/ | ||
+ | 178 mcr_rc=$? | ||
+ | 179 if [ $mcr_rc -ne 0 ]; then exit $mcr_rc; fi | ||
+ | 180 | ||
+ | 181 date " | ||
+ | 182 | ||
+ | |||
+ | |||
+ | |||
+ | </ | ||
+ | |||
+ | The two '' | ||
+ | |||
+ | ==== Running Compiled Matlab Example In Workgroup And Analyzing Output Results ==== | ||
+ | |||
+ | To test the example compiled Matlab job on the '' | ||
+ | then submited with sbatch. | ||
+ | < | ||
+ | [(it_css: | ||
+ | Submitted batch job 9805558 | ||
+ | </ | ||
+ | |||
+ | The assigned job number ID assigned is 9805282. | ||
+ | maxEig-9805558-001.out | ||
+ | They each had the output of one task. For example for taskid 125: | ||
+ | < | ||
+ | Adding package `mcr/ | ||
+ | -- OpenMP job setup complete: | ||
+ | -- OMP_THREAD_LIMIT | ||
+ | -- OMP_PROC_BIND | ||
+ | -- OMP_PLACES | ||
+ | -- MP_BLIST | ||
+ | |||
+ | -- Matlab MCR environment setup complete (on r00n10): | ||
+ | -- MCR_ROOT | ||
+ | -- MCR_CACHE_ROOT | ||
+ | |||
+ | Start 1604084921 | ||
+ | Host r00n10.localdomain.hpc.udel.edu | ||
+ | |||
+ | sd = | ||
+ | |||
+ | 125 | ||
+ | |||
+ | |||
+ | dim = | ||
+ | |||
+ | 5001 | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | Finish 1604085004 | ||
+ | </ | ||
+ | |||
+ | Now we will use [[# | ||
+ | User the link to copy the code perl code, and them create new file in your current directory with that same name '' | ||
+ | |||
+ | <note important> | ||
+ | After copying the code you will need to make sure that you change the job id value in the pattern variable to match **your job id** | ||
+ | </ | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | avgMaxEig = 69.5131125 | ||
+ | </ | ||
+ | The script will all create three new .data files and one new .txt file. We are really on interested in the results8012246.data and the wikimaxEig.txt files. Examples of them are shown below. | ||
+ | <code result9805282.data> | ||
+ | sd dim maxe | ||
+ | 1 5001 70.0220 | ||
+ | 2 5001 71.7546 | ||
+ | 3 5001 70.8331 | ||
+ | 4 5001 70.5714 | ||
+ | 5 5001 69.4923 | ||
+ | .... | ||
+ | 195 5001 68.7440 | ||
+ | 196 5001 71.5652 | ||
+ | 197 5001 69.8530 | ||
+ | 198 5001 70.1213 | ||
+ | 199 5001 70.7535 | ||
+ | 200 5001 67.4221 | ||
+ | </ | ||
+ | |||
+ | These are the same results we got from both the matlab loop and the parallel toolbox, but they where computed | ||
+ | in just about 8.5 minutes. | ||
+ | |||
+ | === wiki9805558.txt Output: === | ||
+ | SGE array job started Fri 30 Oct 2020 03:04:16 PM EDT | ||
+ | |||
+ | Used a total of 16585 CPU seconds over 525 seconds of elapsed time on 2 nodes | ||
+ | ^ Node ^^ Real Clock Time ^^^ | ||
+ | ^ Name ^ | ||
+ | |r00n10.localdomain.hpc.udel.edu| | ||
+ | |r00n47.localdomain.hpc.udel.edu| | ||
+ | |||
+ | |||
+ | Using gnuplot we get a time chart of usage on the 2 nodes and total CPU usage. | ||
+ | Create a file and add the following code to a file named '' | ||
+ | <code plot9805558.gnuplot> | ||
+ | set terminal png size 640,640 | ||
+ | set output " | ||
+ | set multiplot layout 2,1 | ||
+ | set xrange [0:550] | ||
+ | set yrange [0:80] | ||
+ | set key on | ||
+ | set title "Tasks on 2 nodes by time (seconds)" | ||
+ | set key on | ||
+ | plot " | ||
+ | set title "User time usage rate on all nodes" | ||
+ | plot " | ||
+ | |||
+ | </ | ||
+ | <note important> | ||
+ | To create the plot we will need to request an interactive compute node on the devel partition. Once the request has been filled we will need to use VALET to load the gnuplot application and run the '' | ||
+ | < | ||
+ | [(it_css: | ||
+ | salloc: Granted job allocation 9805971 | ||
+ | salloc: Waiting for resource configuration | ||
+ | salloc: Nodes r00n56 are ready for job | ||
+ | [traine@r00n56 MCR_array_II]$ vpkg_require gnuplot | ||
+ | Adding package `gnuplot/ | ||
+ | [traine@r00n56 MCR_array_II]$ gnuplot plot9805558.gnuplot | ||
+ | [traine@r00n56 MCR_array_II]$exit | ||
+ | [(it_css: | ||
+ | </ | ||
+ | <note important> | ||
+ | An example of the '' | ||
+ | {{ : | ||
+ | |||
+ | |||
+ | ==== Perl Script For Compiled Matlab | ||
+ | <note important> | ||
+ | **DO NOT COPY AND PASTE THIS CODE IT MOSTLY LIKELY NOT FORMAT CORRECTLY AND BREAK THE CODE. INSTEAD DOWNLOAD THE FILE WITH '' | ||
+ | </ | ||
+ | |||
+ | <file perl wikigather.pl> | ||
+ | $pattern = ' | ||
+ | $countFile = ' | ||
+ | $usageFile = ' | ||
+ | $nodeUsageFile = " | ||
+ | $nodeUsageFiles = " | ||
+ | @varNames = qw/sd dim maxe/; # used for columns in resultfile | ||
+ | $resultFile = " | ||
+ | & | ||
+ | |||
+ | @node = sort keys %hostCount; | ||
+ | |||
+ | foreach $jobid (keys %startTime) { | ||
+ | my $file = sprintf " | ||
+ | open(WIKI, "> | ||
+ | print WIKI `date -d \@$startTime{$jobid} +\"SGE array job started %c\n" | ||
+ | print WIKI "Used a total of $userTotal{$jobid} CPU seconds "; | ||
+ | print WIKI "over ", | ||
+ | print WIKI "on ", | ||
+ | |||
+ | $baseTime = $startTime{$jobid} if (!defined $baseTime or $startTime{$jobid} < $baseTime); | ||
+ | |||
+ | $avgMaxEig=0; | ||
+ | $count=0; | ||
+ | if ($resultFile) { | ||
+ | my $file = sprintf $resultFile, | ||
+ | open(DATA, "> | ||
+ | print DATA " | ||
+ | foreach $task (sort { $a <=> $b } keys %{$result{$jobid}}) { | ||
+ | my %var = split($;, | ||
+ | print DATA " | ||
+ | $avgMaxEig += $var{' | ||
+ | $count += 1; | ||
+ | } | ||
+ | close(DATA); | ||
+ | print ' | ||
+ | } | ||
+ | |||
+ | printf WIKI "^ %18s ^^ %30s ^^^ %12s ^\n"," | ||
+ | printf WIKI "^ %8s ^ %8s ^ %9s ^ %9s ^ %9s ^ %12s ^\n"," | ||
+ | foreach (@node) { | ||
+ | if ( $hostCountByJob{$jobid}{$_} > 0) { | ||
+ | printf WIKI " | ||
+ | $hostRealMin{$jobid}{$_}, | ||
+ | $hostReal{$jobid}{$_}/ | ||
+ | $hostUser{$jobid}{$_}/ | ||
+ | } | ||
+ | } | ||
+ | close(WIKI); | ||
+ | } | ||
+ | |||
+ | if ($countFile and open(DATA,"> | ||
+ | my(@col, | ||
+ | $col[$_] = 0 for $[ .. $#node; | ||
+ | foreach $time (sort { $a <=> $b } keys %timeCount) { | ||
+ | printf DATA "%d %s\n", $time-$baseTime, | ||
+ | $byNode{$_} += $timeCount{$time}{$_} foreach keys %{$timeCount{$time}}; | ||
+ | $count=0; | ||
+ | $col[$_] = $count += $byNode{$node[$_]} for $[ .. $#node; | ||
+ | printf DATA "%d %s\n", $time-$baseTime, | ||
+ | } | ||
+ | close(DATA); | ||
+ | } | ||
+ | |||
+ | if ($usageFile and open(DATA,"> | ||
+ | my ($time, $lastTime, $slope, $usage); | ||
+ | foreach $time (sort { $a <=> $b } keys %timeRate) { | ||
+ | $usage += $slope*($time - $lastTime); | ||
+ | $slope += $timeRate{$time}{$_} foreach keys %{$timeRate{$time}}; | ||
+ | printf DATA "%d %.4f %.4f\n", | ||
+ | $lastTime = $time; | ||
+ | } | ||
+ | close(DATA); | ||
+ | } | ||
+ | |||
+ | if ($countFile and $usageFile) { | ||
+ | foreach $jobid (keys %startTime) { | ||
+ | my $plotTitle = ' | ||
+ | my(@plot); | ||
+ | $plot[$_] = " | ||
+ | for $[ .. $#node; | ||
+ | my $plotTop = join(",", | ||
+ | my $titleTop = sprintf $plotTitle, 0+@node." | ||
+ | my $key = " | ||
+ | my ($t1,$t2) = (30*int(($startTime{$jobid}-$baseTime)/ | ||
+ | $titleTop = sprintf $plotTitle, "nodes @node" if $#node < 5; | ||
+ | $key = "out horiz top right" if $#node < 9; | ||
+ | | ||
+ | open (PLOT, "| gnuplot" | ||
+ | print PLOT <<" | ||
+ | set term pngcairo font " | ||
+ | set output " | ||
+ | set multiplot layout 2,1 | ||
+ | set xrange [$t1:$t2] | ||
+ | set key $key | ||
+ | set ylabel " | ||
+ | plot $plotTop | ||
+ | set key out horiz top right | ||
+ | set ylabel "Total CPU usage" | ||
+ | set xlabel "Time (seconds)" | ||
+ | plot " | ||
+ | EOP | ||
+ | } | ||
+ | } | ||
+ | |||
+ | sub scanfile { | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | local $/ = undef; | ||
+ | while (< | ||
+ | study; | ||
+ | /^Host (\S+)/m and $host=$1; | ||
+ | /^Start (\d+)/m and $start=$1; | ||
+ | /^Finish (\d+)/m and $finish=$1; | ||
+ | /^SIGUSR1 (\d+)/m and $usr1=$1; | ||
+ | /^SIGUSR2 (\d+)/m and $usr2=$1; | ||
+ | / | ||
+ | / | ||
+ | / | ||
+ | while(/ | ||
+ | } | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | $real = $finish-$start if($real==0); | ||
+ | $user = $real-$sys if($user==0); | ||
+ | | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | $real > $hostRealMax{$jobid}{$host}); | ||
+ | | ||
+ | $real < $hostRealMin{$jobid}{$host}); | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | | ||
+ | | ||
+ | } | ||
+ | |||
+ | sub scandir { | ||
+ | | ||
+ | | ||
+ | |||
+ | | ||
+ | | ||
+ | next if -l " | ||
+ | push @file,$_ if /$pattern/; # save files with this pattern | ||
+ | push @dir,$_ if -d " | ||
+ | } | ||
+ | | ||
+ | |||
+ | | ||
+ | & | ||
+ | } | ||
+ | |||
+ | | ||
+ | & | ||
+ | } | ||
+ | } | ||
+ | |||
+ | </ | ||
+ | |||
+ | ====== Adding checkpoints Matlab job example ====== | ||
+ | |||
+ | Adding [[abstract: | ||
+ | |||
+ | |||
+ | ==== Gathering code for job example ==== | ||
+ | First, we'll create a new directory and copy the needed code into it. | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | </ | ||
+ | |||
+ | You will also want to put a copy of the [[# | ||
+ | |||
+ | Now we will need to make changes to '' | ||
+ | < | ||
+ | % script to run maxEig function 200 times and print average. | ||
+ | count = 200; | ||
+ | dim = 5001; | ||
+ | sumMaxe = 0; | ||
+ | i = 0; | ||
+ | id = str2num(getenv(' | ||
+ | rc = 0; | ||
+ | rc = str2num(getenv(' | ||
+ | tic; | ||
+ | if isempty(rc); | ||
+ | for i=1:count; | ||
+ | sumMaxe = sumMaxe + maxEig(i, | ||
+ | counter = " | ||
+ | disp(counter); | ||
+ | end; | ||
+ | else | ||
+ | | ||
+ | | ||
+ | | ||
+ | if fileID == -1 | ||
+ | | ||
+ | end | ||
+ | | ||
+ | |||
+ | % Read lines from the file and search for the target string | ||
+ | while ~feof(fileID) | ||
+ | line = fgetl(fileID); | ||
+ | if ischar(line) | ||
+ | lineNumber = lineNumber + 1; | ||
+ | if ~isempty(strfind(line, | ||
+ | num=regexp(line,' | ||
+ | counterNumber = str2double(num{1}{1}); | ||
+ | end | ||
+ | end | ||
+ | end | ||
+ | fclose(fileID); | ||
+ | for i =counterNumber: | ||
+ | | ||
+ | | ||
+ | | ||
+ | end; | ||
+ | end; | ||
+ | toc | ||
+ | avgMaxEig = sumMaxe/ | ||
+ | quit | ||
+ | |||
+ | </ | ||
+ | |||
+ | The following changes will need to be added to batch.qs. The option '' | ||
+ | < | ||
+ | ... | ||
+ | 40 #SBATCH --job-name=checkpoint | ||
+ | ... | ||
+ | 60 #SBATCH --time=0-01: | ||
+ | ... | ||
+ | 75 #SBATCH --output=%x-%j.out | ||
+ | 76 #SBATCH --error=%x-%j.out | ||
+ | ... | ||
+ | 85 #SBATCH --mail-user=' | ||
+ | 86 #SBATCH --mail-type=END, | ||
+ | 87 #SBATCH --requeue # allow job requeue | ||
+ | 88 #SBATCH --open-mode=append # the output will append | ||
+ | ... | ||
+ | 90 max_restarts=1 | ||
+ | 91 scontext=$(scontrol show job $SLURM_JOB_ID) | ||
+ | 92 restarts=$(echo " | ||
+ | 93 job_exit_handler() { | ||
+ | 94 counter=$(tail -n 2 ${SLURM_JOB_NAME}-${SLURM_JOB_ID}.out | head -n 1) | ||
+ | 95 echo "Job ${SLURM_JOB_NAME} ended on ${counter}" | ||
+ | 96 if [[ $restarts -lt $max_restarts ]];then | ||
+ | 97 scontrol requeue ${SLURM_JOB_ID} # | ||
+ | 98 # | ||
+ | 99 # Copy all our output files back to the original job directory: | ||
+ | 100 #cp * " | ||
+ | 101 | ||
+ | 102 # Don't call again on EXIT signal, please: | ||
+ | 103 trap - EXIT | ||
+ | 104 exit 0 | ||
+ | 105 else | ||
+ | 106 trap - EXIT | ||
+ | 107 echo "Your job is over the Maximum restarts limit" | ||
+ | 108 exit 1 | ||
+ | 109 fi | ||
+ | 110 } | ||
+ | 111 | ||
+ | 112 export UD_JOB_EXIT_FN=job_exit_handler | ||
+ | ... | ||
+ | 142 # | ||
+ | 143 #srun date | ||
+ | 144 export UD_JOB_EXIT_FN_SIGNALS=" | ||
+ | 145 #Loading MATLAB | ||
+ | 146 vpkg_require matlab/ | ||
+ | 147 #Running the matlab script | ||
+ | 148 UD_EXEC matlab -nodisplay -nojvm -batch "try; script; catch ERR; disp(job_exit_handler(ERR.getReport)); | ||
+ | |||
+ | </ | ||
+ | ==== Running the checkpoint job and its output ==== | ||
+ | We know from the MCR example that this script takes between 2-3 hours to run. In the changes we made to '' | ||
+ | |||
+ | < | ||
+ | [(it_css: | ||
+ | Submitted batch job 20426672 | ||
+ | </ | ||
+ | After the wall clock runs out we will see the following output. | ||
+ | < | ||
+ | [(it_css: | ||
+ | Adding package `matlab/ | ||
+ | -- Registered exit function ' | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 1 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 2 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 3 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 4 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 5 | ||
+ | ... | ||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 53 | ||
+ | slurmstepd: error: *** JOB 20426672 ON r01n13 CANCELLED AT 2023-10-03T17: | ||
+ | Job 20426672 ended on counter: 53 | ||
+ | Adding package `matlab/ | ||
+ | -- Registered exit function ' | ||
+ | |||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 53 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 54 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 55 | ||
+ | ... | ||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 104 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 105 | ||
+ | slurmstepd: error: *** JOB 20426672 ON r01n13 CANCELLED AT 2023-10-03T18: | ||
+ | Job 20426672 ended on counter: 105 | ||
+ | Your job is over the Maximum restarts limit | ||
+ | </ | ||
+ | |||
+ | Now we know that the script completed about 53 of the 200 loop intervals before the wall clock expired. Then it restarts from the 53 loop interval and finally stops at 105 due to reaching the maximum restart limit we set up. | ||
+ | |||
+ | <note tip>If you don't want to wait the full amount of time that the wall clock is set to you can use the command " | ||
+ | < | ||
+ | [(it_css: | ||
+ | Submitted batch job 8390581 | ||
+ | [(it_css: | ||
+ | [(it_css: | ||
+ | -- Registered exit function ' | ||
+ | |||
+ | Adding package `matlab/ | ||
+ | | ||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 1 | ||
+ | ... | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 6 | ||
+ | |||
+ | maxe = | ||
+ | |||
+ | | ||
+ | |||
+ | counter: 7 | ||
+ | slurmstepd: error: *** JOB 8390581 ON r00n17 CANCELLED AT 2020-05-21T10: | ||
+ | Job checkpoint ended on counter: 7 | ||
+ | </ | ||
+ | </ |