====== Computational models for running Matlab on a shared cluster ======
By default, Matlab uses multiple computational threads. From the MATLAB R2011b documentation
matlab -singleCompThread limits MATLAB to a single computational thread.
By default, MATLAB makes use of the multithreading capabilities of the
computer on which it is running.
The default, multiple computational threads, is never a good option when you are sharing a node.
So either use ''-singleCompThread'' option when you start MATLAB or schedule the Matlab job using the exclusive option based on the job scheduler on that cluster such as ''-l exclusive=1'' option for Grid Engine or ''#SBATCH --exclusive'' for Slurm.
Using a node with exclusive access does not mean MATLAB will use all the cores and memory. You should
watch it to see memory and core requirement. To take advantage of the multiple cores you must use the
built-in, matrix functions. You should see your CPU utilization as over 100% when the matrix function
are being executed.
Matlab can, with the distributed computing toolbox, create a parallel pool of workers to be dispatched
in parallel.
===== Multiple computational threads on one node =====
Matlab makes use
of the multithreading capabilities of the computer on which it is running. Matlab uses MKL as its BLAS and LAPACK backend. The versions can be determined by the Matlab commands.
version -blas
version -lapack
To make full use of the MKL computational threads you need to use the built-in matrix functions. The work needed to execute the built-in function will be distribute to multiple cores using MKL threads, which are compatible with OpenMP threads. All
the cores share the same memory, so this is also called the shared memory model for parallel computing. A simple model of how
the total Matlab job performs is
CPU = (p*20 + (1-p))*WALL
The actual number of computational threads is not explicitly mentioned in the Unix documentation. For windows, the documentation specifies that Matlab will use all the cores on the machine. This is clearly not appropriated for Unix clusters. Observations on Mills show that Matlab may use all the cores, but average much less. To use more than one core the Matlab job must be written to use the standard high performance libraries (MKL) linked in the Matlab executable. This works
well, but is not optimized for Mills processor or threading libraries.
==== Test batch jobs using GridEngine ====
Several copies of the same MATLAB script was submitted to run simultaneously. The variance was in the batch script directives.
=== Batch job with exclusive access (only job on node) ===
Part of batch script file:
$ tail -4 batche.qs
#$ -l exclusive=1
vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'
CGROUP report from batch output file:
$ grep CGROUPS *.o425422
[CGROUPS] No /cgroup/memory/UGE/425422.1 exists for this job
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting none bytes (vmem none bytes) on n171 (master)
[CGROUPS] with 20 cores =
[CGROUPS] done.
Memory and timing results:
$ qacct -h n171 -j 425422 | egrep '(start|maxvmem|maxrss|cpu|wallclock|failed)'
start_time 02/16/2016 13:52:16.213
failed 0
ru_wallclock 8037.427
ru_maxrss 658584
cpu 53089.736
maxvmem 2.882G
maxrss 644.949M
=== Batch job with 5 slots 370 MB per core (1.85 GB total) ===
Part of batch script file:
$ tail -6 batch5.qs
#$ -pe threads 5
#$ -l mem_total=1.9G
#$ -l m_mem_free=370M
vpkg_require matlab/r2015a
matlab -nodisplay -nojvm -r 'script'
CGROUP report from batch output file:
$ grep CGROUPS *.o428562
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 388050944 bytes (vmem none bytes) on n139 (master)
[CGROUPS] with 5 cores =
[CGROUPS] done.
Memory and timing results:
$ qacct -h n139 -j 428562 | egrep '(start|maxvmem|maxrss|cpu|wallclock|failed)'
start_time 02/17/2016 18:22:54.254
failed 0
ru_wallclock 5.297
ru_maxrss 165232
cpu 3.090
maxvmem 1017.906M
maxrss 155.109M
=== Batch job with 4 slots 1 GB per core (4 GB total) ===
Part of batch script file:
$ cat batch.qs
#$ -pe threads 4
#$ -l m_mem_free=1G
vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'
CGROUP report from batch output file:
$ grep CGROUPS *.o418695
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 1073741824 bytes (vmem none bytes) on n036 (master)
[CGROUPS] with 4 cores = 0 2 4 6
[CGROUPS] done.
This is sharing the node with the previous job on cores 5-8.
Memory and timing results:
$ qacct -h n036 -j 418695 | egrep '(maxvmem|maxrss|cpu|wallclock|failed)'
failed 0
ru_wallclock 826.759
ru_maxrss 595188
cpu 1629.194
maxvmem 1.801G
maxrss 583.039M
=== Batch job with 3 slots 1 GB per core (3 GB total) ===
Part of batch script file:
$ cat batch.qs
#$ -pe threads 3
#$ -l m_mem_free=1G
vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'
CGROUP report from batch output file:
$ grep CGROUPS *.o408597
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 3221225472 bytes (vmem 9223372036854775807 bytes) on n039 (master)
[CGROUPS] with 3 cores = 0-2
[CGROUPS] done.
Memory and timing results:
$ qacct -h n039 -j 408597 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 13877.991
ru_maxrss 2089812
cpu 90776.109
maxvmem 4.180G
maxrss 0.000
=== Batch job with 2 slots 3.1 GB per core (6.2 GB total) ===
3.1 GB per core on a 20 core node is 62 GB, which allows 20 jobs to fit with 2 GB to spare for system overhead
Part of batch script file:
$ cat batch.qs
# -pe threads 2
# -l m_mem_free=3.1G
vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'
CGROUP report from batch output file:
$ grep CGROUPS *.o408598
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 6657200128 bytes (vmem 9223372036854775807 bytes) on n039 (master)
[CGROUPS] with 2 cores = 3-4
[CGROUPS] done.
This is sharing the node with the previous job, being on cores 3-4.
Memory and timing results:
$ qacct -h n039 -j 408598 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 13904.972
ru_maxrss 2152212
cpu 92110.859
maxvmem 4.208G
maxrss 0.000
=== Batch job with 1 slots 3.1 GB per core (3.1 GB total) ===
3.1 GB per core on a 20 core node is 62 GB, which allows 20 jobs to fit with 2 GB to spare for system overhead
Part of batch script file:
$ cat batch.qs
#$ -l m_mem_free=3.1G
vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'
CGROUP report from batch output file:
$ grep CGROUPS *.o408599
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 3328602112 bytes (vmem 9223372036854775807 bytes) on n036 (master)
[CGROUPS] with 1 core = 0
[CGROUPS] done.
Memory and timing results:
$ qacct -h n036 -j 408599 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 8607.872
ru_maxrss 1935860
cpu 51805.427
maxvmem 4.036G
maxrss 0.000
==== Table ====
^ ^^ requested ^^ used memory and time ^^^
^ jobid ^ host ^ cores ^ memory ^ maxvem ^ cpu ^ wallclock ^
| 408594 | n038 | all 20 | all <64GB | 4.155G | 51321.533 | 8613.132 |
| 408595 | n037 | 5 | 5G | 4.043G | 86578.676 | 13051.171 |
| 408596 | n037 | 4 | 4G | 4.301G | 86330.547 | 13067.863 |
| 408597 | n039 | 3 | 3G | 4.180G | 90776.109 | 13877.991 |
| 408598 | n039 | 2 | 6.2G | 4.208G | 92110.859 | 13904.972 |
| 408599 | n031 | default 1 | 3.1G | 4.036G | 51805.427 | 8607.872 |
==== Table new spread over nodes ====
^ ^^ requested ^^ used memory and time ^^^
^ jobid ^ host ^ cores ^ memory ^ maxvem ^ cpu ^ wallclock ^
| 418705 | n172 | all 20 | all <64GB | 2.904G | 5553.820 | 1089.789 |
| 418704 | n039 | 5 | 5G | 1.874G | 1778.309 | 804.490 |
| 418695 | n036 | 4 | 4G | 1.801G | 1629.194 | 826.759 |
| 418693 | n037 | 3 | 3G | 1.735G | 1475.837 | 863.386 |
| 418691 | n040 | 2 | 6.2G | 1.662G | 1334.752 | 944.711 |
| 418690 | n038 | default 1 | 1G | 1.536G | 1164.087 | 1173.832 |
==== Table new same node ====
^ ^^ requested ^^ used memory and time ^^^^
^ jobid ^ host ^ cores ^ memory ^ maxvem ^ maxrss ^ cpu ^ wallclock ^
| 418768 | n172 | all 20 | all <64GB | 3.805G | 1.633G | 5246.490 | 882.568 |
| 418773 | n036 | 5 | 5G | 1.852G | 578.457M | 1953.868 |930.284 |
| 418772 | n036 | 4 | 4G | 1.779G | 579.109M |1800.191 | 949.475 |
| 418771 | n036 | 3 | 3G | 1.709G | 570.246M |1660.543 | 996.545 |
| 418770 | n036 | 2 | 6.2G | 1.640G | 557.363M | 1543.664 | 1106.315 |
| 418769 | n036 | default 1 | 1G | 1.514G | 564.840M | 1356.694 |1356.256 |
==== Graphs ====
As number of cores increases both the CPU time and memory usage increase linearly. The increased memory is easy to explain by the needed for //private memory//, memory that is not shared. Sometime parallel algorithms can achieve faster wall clock time by recalculating some values, and thus the total CPU time increases.
{{:clusters:matlab:maxeigcpu.png?nolink&640|}}
{{:clusters:matlab:maxeigmem.png?640|}}
Both CPU time and memory are costs to running you algorithm, since they limit the number of other users that can use the node.
To chart both consider a simple cost of CPU*Memory in GB hours. Thus we have two objectives:
* Reduce the run time
* Reduce the cost
{{:clusters:matlab:maxeigcost.png?640|}}
The two extremes on the Pareto optimization curve and good choices. All the nodes in the fastest run time and one node is the least costly (so you can simultaneously run 20 jobs.) The 4 core job is a good compromise.
==== Commands while running ====
$ n=n182
**''ps''** command
$ ssh $n ps -eo pid,ruser,pcpu,pmem,thcount,stime,time,command | egrep '(COMMAND|matlab)'
PID RUSER %CPU %MEM THCNT STIME TIME COMMAND
96970 traine 182 0.8 10 13:52 05:51:25 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
96971 traine 160 0.8 9 13:52 05:09:03 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
96972 traine 119 0.8 7 13:52 03:50:15 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
96974 traine 141 0.8 8 13:52 04:33:14 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
97005 traine 99.5 0.8 5 13:52 03:11:43 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
97130 traine 99.4 0.8 5 13:52 03:11:27 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -singleCompThread -r script -nojvm
**''ps''** command to get threads for one PID
$ ssh $n ps -eLf | egrep '(PID|96970)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
traine 96970 96222 97281 95 10 13:52 ? 03:04:20 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96970 96222 97314 21 10 13:52 ? 00:41:58 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96970 96222 97315 21 10 13:52 ? 00:41:43 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96970 96222 97316 21 10 13:52 ? 00:40:54 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96970 96222 97317 22 10 13:52 ? 00:43:43 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|96971)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
traine 96971 96223 97283 95 9 13:52 ? 03:04:30 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96971 96223 97310 21 9 13:52 ? 00:42:07 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96971 96223 97311 21 9 13:52 ? 00:41:39 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96971 96223 97312 21 9 13:52 ? 00:42:18 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|96972)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
traine 96972 96278 97284 97 7 13:52 ? 03:09:31 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine 96972 96278 97308 21 7 13:52 ? 00:41:50 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|97005)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
traine 97005 96342 97275 99 5 13:52 ? 03:13:49 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|97130)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
traine 97130 96443 97282 99 5 13:52 ? 03:13:53 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -singleCompThread -r script -nojvm
**''top''** command
$ ssh $n top -H -b -n 1 | egrep '(COMMAND|MATLAB)' | grep -v 'S 0'
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
97281 traine 20 0 1785m 577m 73m R 101.2 0.9 185:50.23 MATLAB
97276 traine 20 0 1646m 572m 73m R 101.2 0.9 185:11.74 MATLAB
97275 traine 20 0 1452m 562m 73m R 101.2 0.9 194:31.12 MATLAB
97284 traine 20 0 1572m 562m 73m R 99.2 0.9 190:36.63 MATLAB
97282 traine 20 0 1452m 562m 73m R 99.2 0.9 194:14.58 MATLAB
97283 traine 20 0 1716m 575m 73m R 85.6 0.9 185:40.60 MATLAB
97316 traine 20 0 1785m 577m 73m S 62.3 0.9 41:28.48 MATLAB
97317 traine 20 0 1785m 577m 73m S 62.3 0.9 44:10.42 MATLAB
97315 traine 20 0 1785m 577m 73m S 60.3 0.9 42:15.26 MATLAB
97314 traine 20 0 1785m 577m 73m S 58.4 0.9 42:25.24 MATLAB
97311 traine 20 0 1716m 575m 73m S 33.1 0.9 42:02.23 MATLAB
97310 traine 20 0 1716m 575m 73m S 17.5 0.9 42:29.42 MATLAB
97312 traine 20 0 1716m 575m 73m S 17.5 0.9 42:34.32 MATLAB
97308 traine 20 0 1572m 562m 73m R 9.7 0.9 41:57.47 MATLAB
**''mpstat''** command
$ ssh $n mpstat -P ALL 1 2
Linux 2.6.32-504.30.3.el6.x86_64 (n182) 02/16/2016 _x86_64_ (20 CPU)
05:08:25 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
05:08:26 PM all 48.50 0.00 0.50 0.00 0.00 0.05 0.00 0.00 50.95
05:08:26 PM 0 99.00 0.00 0.00 0.00 0.00 1.00 0.00 0.00 0.00
05:08:26 PM 1 14.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 86.00
05:08:26 PM 2 54.55 0.00 0.00 0.00 0.00 0.00 0.00 0.00 45.45
05:08:26 PM 3 50.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 48.00
05:08:26 PM 4 53.47 0.00 0.99 0.00 0.00 0.00 0.00 0.00 45.54
05:08:26 PM 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:26 PM 6 44.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 56.00
05:08:26 PM 7 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:26 PM 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:26 PM 9 53.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 46.00
05:08:26 PM 10 74.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 26.00
05:08:26 PM 11 52.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 47.00
05:08:26 PM 12 9.00 0.00 4.00 0.00 0.00 0.00 0.00 0.00 87.00
05:08:26 PM 13 53.54 0.00 1.01 0.00 0.00 0.00 0.00 0.00 45.45
05:08:26 PM 14 0.99 0.00 0.99 0.00 0.00 0.00 0.00 0.00 98.02
05:08:26 PM 15 11.88 0.00 0.99 0.00 0.00 0.00 0.00 0.00 87.13
05:08:26 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:08:26 PM 17 1.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.00
05:08:26 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:08:26 PM 19 99.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:26 PM CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
05:08:27 PM all 49.50 0.00 0.55 0.00 0.00 0.00 0.00 0.00 49.95
05:08:27 PM 0 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:27 PM 1 12.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 88.00
05:08:27 PM 2 58.00 0.00 1.00 0.00 0.00 0.00 0.00 0.00 41.00
05:08:27 PM 3 31.68 0.00 0.99 0.00 0.00 0.00 0.00 0.00 67.33
05:08:27 PM 4 63.64 0.00 0.00 0.00 0.00 0.00 0.00 0.00 36.36
05:08:27 PM 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:27 PM 6 26.26 0.00 0.00 0.00 0.00 0.00 0.00 0.00 73.74
05:08:27 PM 7 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:27 PM 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:27 PM 9 57.00 0.00 2.00 0.00 0.00 0.00 0.00 0.00 41.00
05:08:27 PM 10 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
05:08:27 PM 11 60.40 0.00 1.98 0.00 0.00 0.00 0.00 0.00 37.62
05:08:27 PM 12 11.00 0.00 3.00 0.00 0.00 0.00 0.00 0.00 86.00
05:08:27 PM 13 57.43 0.00 0.99 0.00 0.00 0.00 0.00 0.00 41.58
05:08:27 PM 14 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:08:27 PM 15 12.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 88.00
05:08:27 PM 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:08:27 PM 17 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:08:27 PM 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
05:08:27 PM 19 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: CPU %usr %nice %sys %iowait %irq %soft %steal %guest %idle
Average: all 49.00 0.00 0.53 0.00 0.00 0.03 0.00 0.00 50.45
Average: 0 99.50 0.00 0.00 0.00 0.00 0.50 0.00 0.00 0.00
Average: 1 13.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 87.00
Average: 2 56.28 0.00 0.50 0.00 0.00 0.00 0.00 0.00 43.22
Average: 3 40.80 0.00 1.49 0.00 0.00 0.00 0.00 0.00 57.71
Average: 4 58.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 41.00
Average: 5 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: 6 35.18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 64.82
Average: 7 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: 8 100.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00
Average: 9 55.00 0.00 1.50 0.00 0.00 0.00 0.00 0.00 43.50
Average: 10 87.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 13.00
Average: 11 56.22 0.00 1.49 0.00 0.00 0.00 0.00 0.00 42.29
Average: 12 10.00 0.00 3.50 0.00 0.00 0.00 0.00 0.00 86.50
Average: 13 55.50 0.00 1.00 0.00 0.00 0.00 0.00 0.00 43.50
Average: 14 0.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 99.00
Average: 15 11.94 0.00 0.50 0.00 0.00 0.00 0.00 0.00 87.56
Average: 16 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 17 0.50 0.00 0.00 0.00 0.00 0.00 0.00 0.00 99.50
Average: 18 0.00 0.00 0.00 0.00 0.00 0.00 0.00 0.00 100.00
Average: 19 99.50 0.00 0.50 0.00 0.00 0.00 0.00 0.00 0.00
**''qhost''** command
$ qhost -h $n
HOSTNAME ARCH NCPU NSOC NCOR NTHR NLOAD MEMTOT MEMUSE SWAPTO SWAPUS
----------------------------------------------------------------------------------------------
global - - - - - - - - - -
n182 lx-amd64 20 2 20 20 0.38 62.8G 5.1G 2.0G 11.5M
===== Multiple distributed workers =====
===== Single computational threads =====
===== Monitoring Tools =====
There are several tools you can run on your node to monitor the computational threads on your node. In this example n093 is running several MATLAB jobs.
* Ganglia (real time) ''http://mills.hpc.udel.edu/ganglia/?c=mills.hpc&h=n093''
* top
* ps
==== Using top ====
dnairn@mills dnairn]$ ssh n093 top -b -n 1 | egrep '(COMMAND|MATLAB)'
PID USER PR NI VIRT RES SHR S %CPU %MEM TIME+ COMMAND
8209 matusera 20 0 12.7g 6.2g 62m S 1103.5 9.9 202:33.96 MATLAB
2622 matusera 20 0 6917m 256m 62m S 0.0 0.4 9783:37 MATLAB
4386 matusera 20 0 6928m 231m 62m S 0.0 0.4 2850:19 MATLAB
14939 matusera 20 0 6926m 230m 62m S 0.0 0.4 20139:22 MATLAB
16308 matusera 20 0 6930m 242m 62m S 0.0 0.4 24928:39 MATLAB
==== Using ps command ====
[dnairn@mills dnairn]$ ssh n093 ps -eo pid,ruser,pcpu,pmem,thcount,stime,time,command | egrep '(COMMAND|matlab)'
PID RUSER %CPU %MEM THCNT STIME TIME COMMAND
2622 matusera 21.1 0.3 90 Jul29 6-19:03:37 /home/software/matlab/R2011b/bin/glnxa64/MATLAB
4386 matusera 4.7 0.3 90 Jul19 1-23:30:19 /home/software/matlab/R2011b/bin/glnxa64/MATLAB
8209 matusera 1019 6.9 90 13:18 02:34:48 /home/software/matlab/R2011b/bin/glnxa64/MATLAB
14939 matusera 27.3 0.3 90 Jul10 13-23:39:21 /home/software/matlab/R2011b/bin/glnxa64/MATLAB
16308 matusera 46.6 0.3 90 Jul24 17-07:28:38 /home/software/matlab/R2011b/bin/glnxa64/MATLAB
Description of the custom column values from ps man page:
pid PID process ID number of the process.
ruser RUSER real user ID. This will be the textual user ID, if it can be obtained and the field
width permits, or a decimal representation otherwise.
%cpu %CPU cpu utilization of the process in "##.#" format. Currently, it is the CPU time used
divided by the time the process has been running (cputime/realtime ratio), expressed
as a percentage. It will not add up to 100% unless you are lucky. (alias pcpu).
%mem %MEM ratio of the process’s resident set size to the physical memory on the machine,
expressed as a percentage. (alias pmem).
thcount THCNT see nlwp. (alias nlwp). number of kernel threads owned by the process.
bsdstart START time the command started. If the process was started less than 24 hours ago, the
output format is " HH:MM", else it is "mmm dd" (where mmm is the three letters of the
month). See also lstart, start, start_time, and stime.
time TIME cumulative CPU time, "[dd-]hh:mm:ss" format. (alias cputime).
args COMMAND command with all its arguments as a string. Modifications to the arguments may be
shown. The output in this column may contain spaces. A process marked is
partly dead, waiting to be fully destroyed by its parent. Sometimes the process args
will be unavailable; when this happens, ps will instead print the executable name in
brackets. (alias cmd, command). See also the comm format keyword, the -f option, and
the c option.
When specified last, this column will extend to the edge of the display. If ps can not
determine display width, as when output is redirected (piped) into a file or another
command, the output width is undefined. (it may be 80, unlimited, determined by the
TERM variable, and so on) The COLUMNS environment variable or --cols option may be
used to exactly determine the width in this case. The w or -w option may be also be
used to adjust width.
==== ps for threads ====
Select thread with PID 12035 with some activity, that is not C = 0.
[dnairn@mills dnairn]$ ssh n093 ps -eLf | egrep '(PID|12035)' | grep -v ' 0 '
UID PID PPID LWP C NLWP STIME TTY TIME CMD
matusera 12035 11918 12082 98 90 16:39 pts/2 00:43:21 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12132 67 90 16:39 pts/2 00:29:49 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12133 67 90 16:39 pts/2 00:29:42 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12134 67 90 16:39 pts/2 00:29:43 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12135 67 90 16:39 pts/2 00:29:34 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12136 67 90 16:39 pts/2 00:29:47 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12137 67 90 16:39 pts/2 00:29:50 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12138 67 90 16:39 pts/2 00:29:48 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12139 67 90 16:39 pts/2 00:29:45 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12140 67 90 16:39 pts/2 00:29:40 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12141 67 90 16:39 pts/2 00:29:33 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12142 67 90 16:39 pts/2 00:29:32 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
twelve of the 90 threads are doing computation. These are the computation threads.