Technical documentation
What you will find in this area are technical notes produced as UD IT builds and supports the University's various research computing systems.
General Information
Articles in this section discuss generic system administration tasks/observations.
Recipes
SLURM
Caviness
The following articles discuss some of the implementation details involved in tailoring the Slurm job scheduler to the Caviness HPC system.
Additional articles documenting the configuration and use of Slurm on Caviness:
DARWIN
The following articles discuss some of the implementation details involved in tailoring the Slurm job scheduler to the DARWIN HPC system.
Additional articles documenting the configuration and use of Slurm on DARWIN:
Grid Engine
The following articles discuss some of the implementation details involved in tailoring the Grid Engine job scheduler to the UD HPC systems.
- Using subordination to auto-suspend jobs
- Enhanced qlogin for X11 and group propagation
- Exclusive allocation of compute nodes
- Automated orphaned qlogin cleanup on cluster head nodes
- Adding cgroup integration to Grid Engine using prolog/epilog scripts
- Fully supporting Linux cgroups using UD's Grid Engine Cgroup Orchestrator (GECO) software
PERCEUS
Compute nodes are provisioned and managed using the open source PERCEUS toolkit.
- Adding per-node init.d services without creating multiple VNFS images
Lustre Support
Lustre is a high-performance parallel filesystem.
Development
The following articles discuss technical aspects of software development on UD HPC systems.
- Using file striping with the Lustre MPI-IO interfaces in Open MPI
- Learn how your HPC workgroup can organize its own software installs and make use of VALET to streamline software maintenance
- WGSS: WorkGroup-Sponsored Software on the clusters
White papers
According to Wikipedia, a white paper is an authoritative report or guide that helps readers understand an issue, solve a problem, or make a decision. Wikipedia claims that white papers are primarily used in marketing and government, but the definition applies equally well to the computing world.
Occasionally there may be performance benchmarking studies performed on the clusters or on new hardware being considered for use in the clusters. Any significant findings that can be made public will be published in the white paper area of the site.