Differences

This shows you the differences between two versions of the page.

--- software:mpi4py:caviness [2020-04-23 12:07] – [Mpi4py for Caviness] anita
+++ software:mpi4py:caviness [2021-04-27 16:21] (current) – external edit 127.0.0.1
@@ Line 12: / Line 12: @@
 .6.5:20180613   Python MPI modules with Open MPI + system compiler
 </code>
 ===== Sample mpi4py script =====
@@ Line 73: / Line 74: @@
 ===== Batch job =====
-Any MPI job requires you to use ''mpirun'' to initiate it, and this should be done through the Slurm job scheduler to best utilize the resources on the cluster.  Also, if you want to run on more than 1 node (more than 36 cores), then you must initiate a batch job from the head node. Remember if you only have 1 node in your workgroup, then you would need to take advantage of the [[abstract:caviness:runjobs:queues#the-standard-partition|standard]] partition to be able to run a job utilizing multiple nodes, however keep in mind using the standard partition means your job can be preempted so you will need to mindful of [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpointing]] your job.
+Any MPI job requires you to use ''mpirun'' to initiate it, and this should be done through the Slurm job scheduler to best utilize the resources on the cluster.  Also, if you want to run on more than 1 node (more than 36+ cores), then you must initiate a batch job from the head node. Remember if you only have 1 node in your workgroup, then you would need to take advantage of the [[abstract:caviness:runjobs:queues#the-standard-partition|standard]] partition to be able to run a job utilizing multiple nodes, however keep in mind using the standard partition means your job can be preempted so you will need to be mindful of [[abstract:caviness:runjobs:schedule_jobs#handling-system-signals-aka-checkpointing|checkpointing]] your job.
-The best results  have been found by using the //openmpi-psm.qs// template for [[/software/openmpi/openmpi|Open MPI]] jobs. For example, copy the template and call it ''mympi4py.qs'' for the job script using
+The best results have been found by using the //openmpi.qs// template for [[/software/openmpi/openmpi|Open MPI]] jobs. For example, copy the template and call it ''mympi4py.qs'' for the job script using
 <code bash>
-cp /opt/templates/gridengine/openmpi/openmpi-psm.qs mympi4py.qs
+cp /opt/shared/templates/slurm/generic/mpi/openmpi/openmpi.qs mympi4py.qs
 </code>
-and modify it for your application. Change ''NPROC'' to be number of cores you want.  If your task is floating point intensive, then you will get the best performance by specifying twice the number of cores and actually using only 1/2 of them since the FPU (Floating-Point Unit) is shared by core pairs. For example, if you need 24 cores for your job, then you would need to specify 48 cores for ''NPROC'', uncomment the options ''WANT_CPU_AFFINITY=YES'' and ''WANT_HALF_CORES_ONLY=YES'', then it will use only 24 cores by evenly loading them on the number nodes adjusted based on the PSM resources available. In this example, your job could be spread over 3 nodes using 8 cores each due to other jobs running on the nodes. However if you specify [[abstract:farber:runjobs:queues#farber-exclusive-access|exclusive access]] by using ''-l exclusive=1'', then no other jobs can be running on the nodes, giving exclusive access to your job, and it would evenly load 2 nodes and use 12 cores on each.  It is important to specify a multiple of 24 when using core pairs or exclusive access.  Make sure you specify the correct VALET environment for your job. For this example, replace ''vpkg_require openmpi/1.4.4-gcc'' with
+and modify it for your application. There are several ways to communicate the number and layout of worker processes. In this example, we will modify the job script to specify a single node and 4 cores using ''#SBATCH --nodes=1'' and ''#SBATCH --ntasks=4''.  It is important to to carefully read the comments and select the appropriate options for your job.  Make sure you specify the correct VALET environment for your job selecting the correct version for Python 2 or 3 for python-mpi. Since the above example is based on Python 2, we will specify the VALET package as follows:
 <code bash>
-vpkg_require mpi4py
+vpkg_require python-mpi/2.7.15:20180613
-vpkg_require numpy
 </code>
-Lastly, modify ''MY_EXE'' for your mpi4py script.  In this example, it would be
+Lastly, modify the section to execute your MPI program.gh  In this example, it would be
 <code>
-MY_EXE="python scatter-gather.py"
+${UD_MPIRUN} python scatter-gather.py
 </code>
@@ Line 97: / Line 97: @@
 <code bash>
-qsub mympi4py.qs
+sbatch mympi4py.qs
 </code>
-or
+The following output is based on the Python 2 script ''scatter-gather.py'' submitted with 4 cores and 1GB of memory per core in the ''mympi4py.qs'' as described above:
 <code bash>
-qsub -l exclusive=1 mympi4py.qs
+Adding dependency `python/2.7.15` to your environment
-</code>
+Adding dependency `libfabric/1.6.1` to your environment
+Adding dependency `openmpi/3.1.0` to your environment
+Adding package `python-mpi/2.7.15:20180613` to your environment
+-- Open MPI job setup complete (on r03n33):
+--  mpi job startup      = /opt/shared/openmpi/3.1.0/bin/mpirun
+--  nhosts               = 1
+--  nproc                = 4
+--  nproc-per-node       = 4
+--  cpus-per-proc        = 1
-for [[abstract:farber:runjobs:queues#farber-exclusive-access|exclusive access]] of the nodes needed for your job.
+-- Open MPI environment flags:
+--  OMPI_MCA_btl_base_exclude=tcp
+--  OMPI_MCA_rmaps_base_display_map=true
+--  OMPI_MCA_orte_hetero_nodes=true
+--  OMPI_MCA_hwloc_base_binding_policy=core
+--  OMPI_MCA_rmaps_base_mapping_policy=core
-Remember if you want to specify more cores for ''NPROC'' than available in your workgroup, then you need to specify the [[abstract:farber:runjobs:queues#farber-standby-queues|standby]] queue ''-l standby=1'' but remember it has limited run times based on the total number of cores you are using for all of your jobs. In this example, since it is only requesting 48 cores, then it can run up to 8 hours using
+ Data for JOB [51033,1] offset 0 Total slots allocated 4
-<code bash>
+ ========================   JOB MAP   ========================
-qsub -l exclusive=1 -l standby=1 mympi4py.qs
+ Data for node: r03n33  Num slots: 4    Max slots: 0    Num procs: 4
+        Process OMPI jobid: [51033,1] App: 0 Process rank: 0 Bound: socket 0[cor
+e 0[hwt 0]]:[B/././.]
+        Process OMPI jobid: [51033,1] App: 0 Process rank: 1 Bound: socket 0[cor
+e 1[hwt 0]]:[./B/./.]
+        Process OMPI jobid: [51033,1] App: 0 Process rank: 2 Bound: socket 0[cor
+e 2[hwt 0]]:[././B/.]
+        Process OMPI jobid: [51033,1] App: 0 Process rank: 3 Bound: socket 0[cor
+e 3[hwt 0]]:[./././B]
+ =============================================================
+After Scatter:
+[0] [0. 1. 2. 3.]
+[1] [4. 5. 6. 7.]
+[2] [ 8.  9. 10. 11.]
+[3] [12. 13. 14. 15.]
+After Allgather:
+[0] [ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30.]
+[1] [ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30.]
+[2] [ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30.]
+[3] [ 0.  2.  4.  6.  8. 10. 12. 14. 16. 18. 20. 22. 24. 26. 28. 30.]
 </code>
+===== Recipes =====
+If you need to build a Python virtualenv based on a collection of Python modules including mpi4py, then you will need to follow this recipe to get a properly-integrated mpi4py module.
+  * [[technical:recipes:mpi4py-in-virtualenv|Building a Python virtualenv with a properly-integrated mpi4py module]]