hpc documentation

This is an old revision of the document!

Most compute nodes on a Community Cluster are owned by investing entities (faculty and staff). Clusters generally contain a small number of spare nodes that act as temporary replacements for owned nodes undergoing repair or replacement. Jobs are usually not assigned to these nodes since at any time they may be needed in this capacity.

Community Cluster users can make use of these otherwise idle nodes by special request. For example, a user publishing a paper may need to quickly execute a few follow-up calculations that were prompted by the peer review process. The user has just two days in which to run the jobs. In this case, the user could send a request to IT for access to a cluster's spare nodes for the next two days.

Investing entity stakeholders can also request access to spare nodes on behalf of their entire group of users.

Of course, in that time should spare nodes be needed by IT to stand-in for offline owned nodes, jobs running on the spare nodes may need to be killed. So while the spare nodes represent on-demand resources that can be used for jobs with a deadline, the user runs the risk of jobs' being interrupted and possibly not being able to finish before that deadline.

If jobs running on spare nodes do need to be killed, IT will provide two hours notice via email to the jobs' owners.

Access to spare nodes can be requested by submitting a Research Computing Help Request specifying ''High-Performance Computing', selecting the appropriate cluster and specify the following information for the problem details.

the reason for requesting spare nodes
the cluster on which you will run your jobs
a brief description of the jobs that will be run
the date range during which the jobs will be run

For the example cited above, the user might write the following:

I am writing a paper for the Journal of Physical Chemistry that includes simulations of ammonia dissolved in water at high pressure. A reviewer has questioned my results and I need to run two more short simulations to refute his claims.

I would like to use the spare nodes on the Mills cluster starting as soon as possible and lasting two days. The simulations will run via Open MPI across four nodes and should each last about 12 hours.

Job requests are reviewed by IT before access to spare nodes is granted.

Once access is granted the spare nodes augment the owned nodes to which the user has access. No additional flags need to be specified in the user's job scripts or on the command line when the jobs are submitted. The spare nodes will behave as though they are owned nodes when the user's jobs are scheduled.

Interactive jobs will not be scheduled on spare nodes. Only batch jobs are permissible.

Summary

Spare node resources shall be granted on a per-request basis for a limited time period.

Access to spare nodes shall be granted following an IT review of the request.

Spare nodes augment those nodes already available to the user. Batch jobs will be scheduled on spare nodes without authorized users' having to explicitly request it.

Spare nodes may be repurposed at any time to replace owned nodes being repaired or replaced. In these instances, IT will e-mail users running jobs on the affected spare nodes two hours prior to killing those jobs.

Spare nodes

Requesting access

Using spare nodes

Summary

hpc documentation