Table of Contents

Computational models for running Matlab on a shared cluster

By default, Matlab uses multiple computational threads. From the MATLAB R2011b documentation

matlab -singleCompThread limits MATLAB to a single computational thread. 
By default, MATLAB makes use of the multithreading capabilities of the 
computer on which it is running.

The default, multiple computational threads, is never a good option when you are sharing a node. So either use -singleCompThread option when you start MATLAB or schedule the Matlab job using the exclusive option based on the job scheduler on that cluster such as -l exclusive=1 option for Grid Engine or #SBATCH –exclusive for Slurm.

Using a node with exclusive access does not mean MATLAB will use all the cores and memory. You should watch it to see memory and core requirement. To take advantage of the multiple cores you must use the built-in, matrix functions. You should see your CPU utilization as over 100% when the matrix function are being executed.

Matlab can, with the distributed computing toolbox, create a parallel pool of workers to be dispatched in parallel.

Multiple computational threads on one node

Matlab makes use of the multithreading capabilities of the computer on which it is running. Matlab uses MKL as its BLAS and LAPACK backend. The versions can be determined by the Matlab commands.

version -blas
version -lapack

To make full use of the MKL computational threads you need to use the built-in matrix functions. The work needed to execute the built-in function will be distribute to multiple cores using MKL threads, which are compatible with OpenMP threads. All the cores share the same memory, so this is also called the shared memory model for parallel computing. A simple model of how the total Matlab job performs is

  CPU = (p*20 + (1-p))*WALL

The actual number of computational threads is not explicitly mentioned in the Unix documentation. For windows, the documentation specifies that Matlab will use all the cores on the machine. This is clearly not appropriated for Unix clusters. Observations on Mills show that Matlab may use all the cores, but average much less. To use more than one core the Matlab job must be written to use the standard high performance libraries (MKL) linked in the Matlab executable. This works well, but is not optimized for Mills processor or threading libraries.

Test batch jobs using GridEngine

Several copies of the same MATLAB script was submitted to run simultaneously. The variance was in the batch script directives.

Batch job with exclusive access (only job on node)

Part of batch script file:

$ tail -4 batche.qs
#$ -l exclusive=1

vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'

CGROUP report from batch output file:

$ grep CGROUPS *.o425422
[CGROUPS] No /cgroup/memory/UGE/425422.1 exists for this job
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting none bytes (vmem none bytes) on n171 (master)
[CGROUPS]   with 20 cores = 
[CGROUPS] done.

Memory and timing results:

$ qacct -h n171 -j 425422 | egrep '(start|maxvmem|maxrss|cpu|wallclock|failed)'
start_time   02/16/2016 13:52:16.213
failed       0    
ru_wallclock 8037.427     
ru_maxrss    658584              
cpu          53089.736    
maxvmem      2.882G
maxrss       644.949M

Batch job with 5 slots 370 MB per core (1.85 GB total)

Part of batch script file:

$ tail -6 batch5.qs
#$ -pe threads 5
#$ -l mem_total=1.9G
#$ -l m_mem_free=370M

vpkg_require matlab/r2015a
matlab -nodisplay -nojvm -r 'script'

CGROUP report from batch output file:

$ grep CGROUPS *.o428562
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 388050944 bytes (vmem none bytes) on n139 (master)
[CGROUPS]   with 5 cores = 
[CGROUPS] done.

Memory and timing results:

$ qacct -h n139 -j 428562 | egrep '(start|maxvmem|maxrss|cpu|wallclock|failed)'
start_time   02/17/2016 18:22:54.254
failed       0    
ru_wallclock 5.297        
ru_maxrss    165232              
cpu          3.090        
maxvmem      1017.906M
maxrss       155.109M

Batch job with 4 slots 1 GB per core (4 GB total)

Part of batch script file:

$ cat batch.qs
#$ -pe threads 4
#$ -l m_mem_free=1G

vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'

CGROUP report from batch output file:

$ grep CGROUPS *.o418695
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 1073741824 bytes (vmem none bytes) on n036 (master)
[CGROUPS]   with 4 cores = 0 2 4 6
[CGROUPS] done.

This is sharing the node with the previous job on cores 5-8.

Memory and timing results:

$ qacct -h n036 -j 418695 | egrep '(maxvmem|maxrss|cpu|wallclock|failed)'
failed       0    
ru_wallclock 826.759      
ru_maxrss    595188              
cpu          1629.194     
maxvmem      1.801G
maxrss       583.039M

Batch job with 3 slots 1 GB per core (3 GB total)

Part of batch script file:

$ cat batch.qs
#$ -pe threads 3
#$ -l m_mem_free=1G

vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'

CGROUP report from batch output file:

$ grep CGROUPS *.o408597
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 3221225472 bytes (vmem 9223372036854775807 bytes) on n039 (master)
[CGROUPS]   with 3 cores = 0-2
[CGROUPS] done.

Memory and timing results:

$ qacct -h n039 -j 408597 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 13877.991    
ru_maxrss    2089812             
cpu          90776.109    
maxvmem      4.180G
maxrss       0.000

Batch job with 2 slots 3.1 GB per core (6.2 GB total)

3.1 GB per core on a 20 core node is 62 GB, which allows 20 jobs to fit with 2 GB to spare for system overhead

Part of batch script file:

$ cat batch.qs
# -pe threads 2
# -l m_mem_free=3.1G

vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'

CGROUP report from batch output file:

$ grep CGROUPS *.o408598
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 6657200128 bytes (vmem 9223372036854775807 bytes) on n039 (master)
[CGROUPS]   with 2 cores = 3-4
[CGROUPS] done.

This is sharing the node with the previous job, being on cores 3-4.

Memory and timing results:

$ qacct -h n039 -j 408598 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 13904.972    
ru_maxrss    2152212             
cpu          92110.859    
maxvmem      4.208G
maxrss       0.000

Batch job with 1 slots 3.1 GB per core (3.1 GB total)

3.1 GB per core on a 20 core node is 62 GB, which allows 20 jobs to fit with 2 GB to spare for system overhead

Part of batch script file:

$ cat batch.qs
#$ -l m_mem_free=3.1G

vpkg_require matlab/r2014b
matlab -nodisplay -nojvm -r 'script'

CGROUP report from batch output file:

$ grep CGROUPS *.o408599
[CGROUPS] UD Grid Engine cgroup setup commencing
[CGROUPS] Setting 3328602112 bytes (vmem 9223372036854775807 bytes) on n036 (master)
[CGROUPS]   with 1 core = 0
[CGROUPS] done.

Memory and timing results:

$ qacct -h n036 -j 408599 | egrep '(maxvmem|maxrss|cpu|wallclock)'
ru_wallclock 8607.872     
ru_maxrss    1935860             
cpu          51805.427    
maxvmem      4.036G
maxrss       0.000

Table

requested used memory and time
jobid host cores memory maxvem cpu wallclock
408594 n038 all 20 all <64GB 4.155G 51321.533 8613.132
408595 n037 5 5G 4.043G 86578.676 13051.171
408596 n037 4 4G 4.301G 86330.547 13067.863
408597 n039 3 3G 4.180G 90776.109 13877.991
408598 n039 2 6.2G 4.208G 92110.859 13904.972
408599 n031 default 1 3.1G 4.036G 51805.427 8607.872

Table new spread over nodes

requested used memory and time
jobid host cores memory maxvem cpu wallclock
418705 n172 all 20 all <64GB 2.904G 5553.820 1089.789
418704 n039 5 5G 1.874G 1778.309 804.490
418695 n036 4 4G 1.801G 1629.194 826.759
418693 n037 3 3G 1.735G 1475.837 863.386
418691 n040 2 6.2G 1.662G 1334.752 944.711
418690 n038 default 1 1G 1.536G 1164.087 1173.832

Table new same node

requested used memory and time
jobid host cores memory maxvem maxrss cpu wallclock
418768 n172 all 20 all <64GB 3.805G 1.633G 5246.490 882.568
418773 n036 5 5G 1.852G 578.457M 1953.868 930.284
418772 n036 4 4G 1.779G 579.109M 1800.191 949.475
418771 n036 3 3G 1.709G 570.246M 1660.543 996.545
418770 n036 2 6.2G 1.640G 557.363M 1543.664 1106.315
418769 n036 default 1 1G 1.514G 564.840M 1356.694 1356.256

Graphs

As number of cores increases both the CPU time and memory usage increase linearly. The increased memory is easy to explain by the needed for private memory, memory that is not shared. Sometime parallel algorithms can achieve faster wall clock time by recalculating some values, and thus the total CPU time increases.

Both CPU time and memory are costs to running you algorithm, since they limit the number of other users that can use the node. To chart both consider a simple cost of CPU*Memory in GB hours. Thus we have two objectives:

The two extremes on the Pareto optimization curve and good choices. All the nodes in the fastest run time and one node is the least costly (so you can simultaneously run 20 jobs.) The 4 core job is a good compromise.

Commands while running

$ n=n182

ps command

$ ssh $n ps -eo pid,ruser,pcpu,pmem,thcount,stime,time,command | egrep '(COMMAND|matlab)'
   PID RUSER    %CPU %MEM THCNT STIME     TIME COMMAND
 96970 traine    182  0.8    10 13:52 05:51:25 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
 96971 traine    160  0.8     9 13:52 05:09:03 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
 96972 traine    119  0.8     7 13:52 03:50:15 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
 96974 traine    141  0.8     8 13:52 04:33:14 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
 97005 traine   99.5  0.8     5 13:52 03:11:43 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
 97130 traine   99.4  0.8     5 13:52 03:11:27 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -singleCompThread -r script -nojvm

ps command to get threads for one PID

$ ssh $n ps -eLf | egrep '(PID|96970)' | grep -v ' 0  '
UID         PID   PPID    LWP  C NLWP STIME TTY          TIME CMD
traine    96970  96222  97281 95   10 13:52 ?        03:04:20 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96970  96222  97314 21   10 13:52 ?        00:41:58 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96970  96222  97315 21   10 13:52 ?        00:41:43 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96970  96222  97316 21   10 13:52 ?        00:40:54 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96970  96222  97317 22   10 13:52 ?        00:43:43 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|96971)' | grep -v ' 0  '
UID         PID   PPID    LWP  C NLWP STIME TTY          TIME CMD
traine    96971  96223  97283 95    9 13:52 ?        03:04:30 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96971  96223  97310 21    9 13:52 ?        00:42:07 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96971  96223  97311 21    9 13:52 ?        00:41:39 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96971  96223  97312 21    9 13:52 ?        00:42:18 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|96972)' | grep -v ' 0  '
UID         PID   PPID    LWP  C NLWP STIME TTY          TIME CMD
traine    96972  96278  97284 97    7 13:52 ?        03:09:31 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
traine    96972  96278  97308 21    7 13:52 ?        00:41:50 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|97005)' | grep -v ' 0  '
UID         PID   PPID    LWP  C NLWP STIME TTY          TIME CMD
traine    97005  96342  97275 99    5 13:52 ?        03:13:49 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -r script -nojvm
$ ssh $n ps -eLf | egrep '(PID|97130)' | grep -v ' 0  '
UID         PID   PPID    LWP  C NLWP STIME TTY          TIME CMD
traine    97130  96443  97282 99    5 13:52 ?        03:13:53 /home/software/matlab/r2014b/bin/glnxa64/MATLAB -nodisplay -singleCompThread -r script -nojvm

top command

$ ssh $n top -H -b -n 1 | egrep '(COMMAND|MATLAB)' | grep -v 'S  0'
   PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND           
 97281 traine    20   0 1785m 577m  73m R 101.2  0.9 185:50.23 MATLAB           
 97276 traine    20   0 1646m 572m  73m R 101.2  0.9 185:11.74 MATLAB           
 97275 traine    20   0 1452m 562m  73m R 101.2  0.9 194:31.12 MATLAB           
 97284 traine    20   0 1572m 562m  73m R 99.2  0.9 190:36.63 MATLAB            
 97282 traine    20   0 1452m 562m  73m R 99.2  0.9 194:14.58 MATLAB            
 97283 traine    20   0 1716m 575m  73m R 85.6  0.9 185:40.60 MATLAB            
 97316 traine    20   0 1785m 577m  73m S 62.3  0.9  41:28.48 MATLAB            
 97317 traine    20   0 1785m 577m  73m S 62.3  0.9  44:10.42 MATLAB            
 97315 traine    20   0 1785m 577m  73m S 60.3  0.9  42:15.26 MATLAB            
 97314 traine    20   0 1785m 577m  73m S 58.4  0.9  42:25.24 MATLAB            
 97311 traine    20   0 1716m 575m  73m S 33.1  0.9  42:02.23 MATLAB            
 97310 traine    20   0 1716m 575m  73m S 17.5  0.9  42:29.42 MATLAB            
 97312 traine    20   0 1716m 575m  73m S 17.5  0.9  42:34.32 MATLAB            
 97308 traine    20   0 1572m 562m  73m R  9.7  0.9  41:57.47 MATLAB

mpstat command

         
$ ssh $n mpstat -P ALL 1 2
Linux 2.6.32-504.30.3.el6.x86_64 (n182) 	02/16/2016 	_x86_64_	(20 CPU)

05:08:25 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:08:26 PM  all   48.50    0.00    0.50    0.00    0.00    0.05    0.00    0.00   50.95
05:08:26 PM    0   99.00    0.00    0.00    0.00    0.00    1.00    0.00    0.00    0.00
05:08:26 PM    1   14.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   86.00
05:08:26 PM    2   54.55    0.00    0.00    0.00    0.00    0.00    0.00    0.00   45.45
05:08:26 PM    3   50.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   48.00
05:08:26 PM    4   53.47    0.00    0.99    0.00    0.00    0.00    0.00    0.00   45.54
05:08:26 PM    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:26 PM    6   44.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   56.00
05:08:26 PM    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:26 PM    8  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:26 PM    9   53.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   46.00
05:08:26 PM   10   74.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   26.00
05:08:26 PM   11   52.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   47.00
05:08:26 PM   12    9.00    0.00    4.00    0.00    0.00    0.00    0.00    0.00   87.00
05:08:26 PM   13   53.54    0.00    1.01    0.00    0.00    0.00    0.00    0.00   45.45
05:08:26 PM   14    0.99    0.00    0.99    0.00    0.00    0.00    0.00    0.00   98.02
05:08:26 PM   15   11.88    0.00    0.99    0.00    0.00    0.00    0.00    0.00   87.13
05:08:26 PM   16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:08:26 PM   17    1.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.00
05:08:26 PM   18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:08:26 PM   19   99.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00    0.00

05:08:26 PM  CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
05:08:27 PM  all   49.50    0.00    0.55    0.00    0.00    0.00    0.00    0.00   49.95
05:08:27 PM    0  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:27 PM    1   12.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   88.00
05:08:27 PM    2   58.00    0.00    1.00    0.00    0.00    0.00    0.00    0.00   41.00
05:08:27 PM    3   31.68    0.00    0.99    0.00    0.00    0.00    0.00    0.00   67.33
05:08:27 PM    4   63.64    0.00    0.00    0.00    0.00    0.00    0.00    0.00   36.36
05:08:27 PM    5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:27 PM    6   26.26    0.00    0.00    0.00    0.00    0.00    0.00    0.00   73.74
05:08:27 PM    7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:27 PM    8  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:27 PM    9   57.00    0.00    2.00    0.00    0.00    0.00    0.00    0.00   41.00
05:08:27 PM   10  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
05:08:27 PM   11   60.40    0.00    1.98    0.00    0.00    0.00    0.00    0.00   37.62
05:08:27 PM   12   11.00    0.00    3.00    0.00    0.00    0.00    0.00    0.00   86.00
05:08:27 PM   13   57.43    0.00    0.99    0.00    0.00    0.00    0.00    0.00   41.58
05:08:27 PM   14    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:08:27 PM   15   12.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   88.00
05:08:27 PM   16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:08:27 PM   17    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:08:27 PM   18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
05:08:27 PM   19  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00

Average:     CPU    %usr   %nice    %sys %iowait    %irq   %soft  %steal  %guest   %idle
Average:     all   49.00    0.00    0.53    0.00    0.00    0.03    0.00    0.00   50.45
Average:       0   99.50    0.00    0.00    0.00    0.00    0.50    0.00    0.00    0.00
Average:       1   13.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   87.00
Average:       2   56.28    0.00    0.50    0.00    0.00    0.00    0.00    0.00   43.22
Average:       3   40.80    0.00    1.49    0.00    0.00    0.00    0.00    0.00   57.71
Average:       4   58.50    0.00    0.50    0.00    0.00    0.00    0.00    0.00   41.00
Average:       5  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       6   35.18    0.00    0.00    0.00    0.00    0.00    0.00    0.00   64.82
Average:       7  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       8  100.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00
Average:       9   55.00    0.00    1.50    0.00    0.00    0.00    0.00    0.00   43.50
Average:      10   87.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00   13.00
Average:      11   56.22    0.00    1.49    0.00    0.00    0.00    0.00    0.00   42.29
Average:      12   10.00    0.00    3.50    0.00    0.00    0.00    0.00    0.00   86.50
Average:      13   55.50    0.00    1.00    0.00    0.00    0.00    0.00    0.00   43.50
Average:      14    0.50    0.00    0.50    0.00    0.00    0.00    0.00    0.00   99.00
Average:      15   11.94    0.00    0.50    0.00    0.00    0.00    0.00    0.00   87.56
Average:      16    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      17    0.50    0.00    0.00    0.00    0.00    0.00    0.00    0.00   99.50
Average:      18    0.00    0.00    0.00    0.00    0.00    0.00    0.00    0.00  100.00
Average:      19   99.50    0.00    0.50    0.00    0.00    0.00    0.00    0.00    0.00

qhost command

$ qhost -h $n
HOSTNAME                ARCH         NCPU NSOC NCOR NTHR NLOAD  MEMTOT  MEMUSE  SWAPTO  SWAPUS
----------------------------------------------------------------------------------------------
global                  -               -    -    -    -     -       -       -       -       -
n182                    lx-amd64       20    2   20   20  0.38   62.8G    5.1G    2.0G   11.5M

Multiple distributed workers

Single computational threads

Monitoring Tools

There are several tools you can run on your node to monitor the computational threads on your node. In this example n093 is running several MATLAB jobs.

Using top

dnairn@mills dnairn]$ ssh n093 top -b -n 1 | egrep '(COMMAND|MATLAB)'
  PID USER      PR  NI  VIRT  RES  SHR S %CPU %MEM    TIME+  COMMAND            
 8209 matusera  20   0 12.7g 6.2g  62m S 1103.5  9.9 202:33.96 MATLAB           
 2622 matusera  20   0 6917m 256m  62m S  0.0  0.4   9783:37 MATLAB             
 4386 matusera  20   0 6928m 231m  62m S  0.0  0.4   2850:19 MATLAB             
14939 matusera  20   0 6926m 230m  62m S  0.0  0.4  20139:22 MATLAB             
16308 matusera  20   0 6930m 242m  62m S  0.0  0.4  24928:39 MATLAB

Using ps command

[dnairn@mills dnairn]$ ssh n093 ps -eo pid,ruser,pcpu,pmem,thcount,stime,time,command | egrep '(COMMAND|matlab)'
  PID RUSER    %CPU %MEM THCNT STIME  TIME       COMMAND
 2622 matusera 21.1  0.3    90 Jul29 6-19:03:37  /home/software/matlab/R2011b/bin/glnxa64/MATLAB
 4386 matusera  4.7  0.3    90 Jul19 1-23:30:19  /home/software/matlab/R2011b/bin/glnxa64/MATLAB
 8209 matusera 1019  6.9    90 13:18 02:34:48    /home/software/matlab/R2011b/bin/glnxa64/MATLAB
14939 matusera 27.3  0.3    90 Jul10 13-23:39:21 /home/software/matlab/R2011b/bin/glnxa64/MATLAB
16308 matusera 46.6  0.3    90 Jul24 17-07:28:38 /home/software/matlab/R2011b/bin/glnxa64/MATLAB

Description of the custom column values from ps man page:

pid        PID      process ID number of the process.
ruser      RUSER    real user ID. This will be the textual user ID, if it can be obtained and the field
                    width permits, or a decimal representation otherwise.
%cpu       %CPU     cpu utilization of the process in "##.#" format. Currently, it is the CPU time used
                    divided by the time the process has been running (cputime/realtime ratio), expressed
                    as a percentage. It will not add up to 100% unless you are lucky. (alias pcpu).
                   
%mem       %MEM     ratio of the process’s resident set size  to the physical memory on the machine,
                    expressed as a percentage. (alias pmem).
thcount    THCNT    see nlwp. (alias nlwp). number of kernel threads owned by the process.
bsdstart   START    time the command started. If the process was started less than 24 hours ago, the
                    output format is " HH:MM", else it is "mmm dd" (where mmm is the three letters of the
                    month). See also lstart, start, start_time, and stime.
time       TIME     cumulative CPU time, "[dd-]hh:mm:ss" format. (alias cputime).
args       COMMAND  command with all its arguments as a string. Modifications to the arguments may be
                    shown. The output in this column may contain spaces. A process marked <defunct> is
                    partly dead, waiting to be fully destroyed by its parent. Sometimes the process args
                    will be unavailable; when this happens, ps will instead print the executable name in
                    brackets. (alias cmd, command). See also the comm format keyword, the -f option, and
                    the c option.
                    
                    When specified last, this column will extend to the edge of the display. If ps can not
                    determine display width, as when output is redirected (piped) into a file or another
                    command, the output width is undefined. (it may be 80, unlimited, determined by the
                    TERM variable, and so on) The COLUMNS environment variable or --cols option may be
                    used to exactly determine the width in this case. The w or -w option may be also be
                    used to adjust width.

ps for threads

Select thread with PID 12035 with some activity, that is not C = 0.

[dnairn@mills dnairn]$ ssh n093 ps -eLf | egrep '(PID|12035)' | grep -v ' 0  ' 
UID        PID  PPID   LWP  C NLWP STIME TTY          TIME CMD
matusera 12035 11918 12082 98   90 16:39 pts/2    00:43:21 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12132 67   90 16:39 pts/2    00:29:49 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12133 67   90 16:39 pts/2    00:29:42 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12134 67   90 16:39 pts/2    00:29:43 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12135 67   90 16:39 pts/2    00:29:34 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12136 67   90 16:39 pts/2    00:29:47 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12137 67   90 16:39 pts/2    00:29:50 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12138 67   90 16:39 pts/2    00:29:48 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12139 67   90 16:39 pts/2    00:29:45 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12140 67   90 16:39 pts/2    00:29:40 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12141 67   90 16:39 pts/2    00:29:33 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop
matusera 12035 11918 12142 67   90 16:39 pts/2    00:29:32 /home/software/matlab/R2011b/bin/glnxa64/MATLAB -nosplash -nodesktop

twelve of the 90 threads are doing computation. These are the computation threads.