This shows you the differences between two versions of the page.
Both sides previous revision Previous revision Next revision | Previous revision |
technical:generic:mpi-and-ucx-mr-cache [2024-12-05 12:41] – frey | technical:generic:mpi-and-ucx-mr-cache [2024-12-05 14:19] (current) – frey |
---|
Ideally, it would be even more useful to respond to ''VASP_MPI_BLOCK=none'' or ''VASP_MPI_BLOCK=0'' by **not** fragmenting the array and instead issuing a single ''MPI_Reduce()''. Since many modern transport libraries underpinning MPI effect fragmentation themselves and only as necessary, the conditions that prompted ''M_sum_master_d()'' back in the era of the Intel Pentium no longer exist. | Ideally, it would be even more useful to respond to ''VASP_MPI_BLOCK=none'' or ''VASP_MPI_BLOCK=0'' by **not** fragmenting the array and instead issuing a single ''MPI_Reduce()''. Since many modern transport libraries underpinning MPI effect fragmentation themselves and only as necessary, the conditions that prompted ''M_sum_master_d()'' back in the era of the Intel Pentium no longer exist. |
| |
==== UCX Control ==== | ==== UCX control ==== |
| |
When runtime variation of ''MPI_BLOCK'' did not completely remove the issue, further testing was performed. The data collected eventually led to Google searches that returned a [[https://github.com/openucx/ucx/issues/6264|Github issue with the openucx project]]. In the dialog associated with the issue, one interesting point was raised: | When runtime variation of ''MPI_BLOCK'' did not completely remove the issue, further testing was performed. The data collected eventually led to Google searches that returned a [[https://github.com/openucx/ucx/issues/6264|Github issue with the openucx project]]. In the dialog associated with the issue, one interesting point was raised: |
| |
<WRAP center round alert 60%> | <WRAP center round alert 60%> |
When setting ''UCX_IB_RCACHE_MAX_REGIONS'' in a job's runtime environment, **please do not exceed a value of 9000** unless you have explicitly allocated more on-node ranks to the job than you will use. E.g. requesting ''--nodes=1 --ntasks=8'' and running the MPI program with just 2 ranks implies that ''UCX_IB_RCACHE_MAX_REGIONS=$((9000*4))'' is permissible. | When setting ''UCX_IB_RCACHE_MAX_REGIONS'' in a job's runtime environment, **please do not exceed a value of 9000** unless you have explicitly allocated more on-node tasks to the job than you will use. E.g. requesting ''--nodes=1 --ntasks=8'' and running the MPI program with just 2 ranks implies that ''%%UCX_IB_RCACHE_MAX_REGIONS=$((9000*4))%%'' is permissible. |
</WRAP> | </WRAP> |
| |