technical:generic:mpi-and-ucx-mr-cache

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
technical:generic:mpi-and-ucx-mr-cache [2024-12-05 12:42] – [Solution] freytechnical:generic:mpi-and-ucx-mr-cache [2024-12-05 14:19] (current) frey
Line 58: Line 58:
 Ideally, it would be even more useful to respond to ''VASP_MPI_BLOCK=none'' or ''VASP_MPI_BLOCK=0'' by **not** fragmenting the array and instead issuing a single ''MPI_Reduce()'' Since many modern transport libraries underpinning MPI effect fragmentation themselves and only as necessary, the conditions that prompted ''M_sum_master_d()'' back in the era of the Intel Pentium no longer exist. Ideally, it would be even more useful to respond to ''VASP_MPI_BLOCK=none'' or ''VASP_MPI_BLOCK=0'' by **not** fragmenting the array and instead issuing a single ''MPI_Reduce()'' Since many modern transport libraries underpinning MPI effect fragmentation themselves and only as necessary, the conditions that prompted ''M_sum_master_d()'' back in the era of the Intel Pentium no longer exist.
  
-==== UCX Control ====+==== UCX control ====
  
 When runtime variation of ''MPI_BLOCK'' did not completely remove the issue, further testing was performed.  The data collected eventually led to Google searches that returned a [[https://github.com/openucx/ucx/issues/6264|Github issue with the openucx project]].  In the dialog associated with the issue, one interesting point was raised: When runtime variation of ''MPI_BLOCK'' did not completely remove the issue, further testing was performed.  The data collected eventually led to Google searches that returned a [[https://github.com/openucx/ucx/issues/6264|Github issue with the openucx project]].  In the dialog associated with the issue, one interesting point was raised:
Line 115: Line 115:
  
 <WRAP center round alert 60%> <WRAP center round alert 60%>
-When setting ''UCX_IB_RCACHE_MAX_REGIONS'' in a job's runtime environment, **please do not exceed a value of 9000** unless you have explicitly allocated more on-node tasks to the job than you will use.  E.g. requesting ''--nodes=1 --ntasks=8'' and running the MPI program with just 2 ranks implies that ''UCX_IB_RCACHE_MAX_REGIONS=$((9000*4))'' is permissible.+When setting ''UCX_IB_RCACHE_MAX_REGIONS'' in a job's runtime environment, **please do not exceed a value of 9000** unless you have explicitly allocated more on-node tasks to the job than you will use.  E.g. requesting ''--nodes=1 --ntasks=8'' and running the MPI program with just 2 ranks implies that ''%%UCX_IB_RCACHE_MAX_REGIONS=$((9000*4))%%'' is permissible.
 </WRAP> </WRAP>
  
  • technical/generic/mpi-and-ucx-mr-cache.1733420533.txt.gz
  • Last modified: 2024-12-05 12:42
  • by frey