software:openmpi:mills

IT provides templates for three job script variants for the openmpi parallel environment on Mills located in /opt/shared/templates/gridengine/openmpi called:

  • OpenIB Verbs variant: openmpi-ibverb.qs
    • Generally, you should get good performance from the default, verb-based openIB, approach, one that is not limited by the Grid Engine scheduler.
  • High-bandwidth InfiniBand variant: openmpi-psm.qs
    • This variant uses Qlogic's proprietary PSM (Performance-Scaled Messaging) technique and should result in somewhat better InfiniBand performance. However, it also consumes more processor resources and makes the compute nodes owned by your group less accessible to others in the group. This variant may increase your wait-times in the Grid Engine queues. See PSM (Performance-Scaled Messaging) for more details.
  • Low-bandwidth Ethernet variant: openmpi-gige.qs
    • If your openMPI program has been tuned for Ethernet rather than InfiniBand, this job script may be more suitable than the standard openmpi (or openmpi-ibverb) script.

You may copy and customize these templates to provide the best performance when running your Open MPI applications. See Running Applications on Mills for details about resources. The options you select can best be understood by reading about Mills tuning and threading performance as it relates to the resources and effects on your applications.

It is a good idea to periodically check in /opt/shared/templates/gridengine/openmpi for changes in existing templates, or the addition of new templates, designed to provide the best performance on Mills.

Performance-Scaled Messaging (PSM), and is an accelerated interface between MPI libraries and the Infiniband network adapter. The PSM software uses hardware contexts to provide the direct interface between an MPI library and the Infiniband hardware – and there are a limited number of contexts available on a node: 16, to be exact. So on a 24 core node there's no one-to-one availability. The default behavior of PSM-aware software is to grab as many of the contexts as possible: so if you end up sharing a node with other MPI programs and those programs:

  • are (aggregate) using >= 16 cores of the node
  • are NOT using IT's PSM-aware job script template

then there are likely zero PSM contexts available. The library's way of telling you this is the error message you cited:

	ipath_userinit: assign_context command failed: Network is down
	can't open /dev/ipath, network down (err=26)

An error which is not terribly intuitive.

The PSM-aware job script template, openmpi-psm.qs, that IT provides includes a section of BASH code that determines how many PSM contexts are available on each node on which your job is scheduled and sets environment variables to limit PSM usage accordingly.

  • software/openmpi/mills.txt
  • Last modified: 2021-04-27 16:21
  • by 127.0.0.1