abstract:mills:status

An opt-in node status notification service is available for Mills users. This service sends an email notification when any nodes in your workgroup transition between two of the following states:

  • offline: down and not reachable.
  • online: up and reachable, but queues are disabled.
  • accepting-jobs: up, reachable and queues are enabled.

The service currently checks nodes' statuses at the top of every hour and delivers email notifications at the bottom of the hour. If you opt-in and any of your workgroup's nodes have changed state, you will receive ONE email message detailing the status changes.

To opt-in the node status notification service for your workgroup(s), send an e-mail to consult@udel.edu with subject="Node notification opt-in Mills" and make the first line of the message body be

    Type=Cluster
   

mills.hpc.udel.edu has live resources: system status, job stats, system alerts.

UD IT HPC has Mills machine information: attributes including a database of node information, milestones, offline nodes and nodes disabled for maintenance.

Cluster monitoring for Mills uses Ganglia to monitor its hardware components.

System Alerts for Mills is an opt-in service notifying you about status changes on any of your workgroup's nodes.

Job statistics: Check here for the total number of jobs that ended on each day over a range (week, 2 weeks, month, 6 months, year) with an overlay of the total number of jobs which the job scheduler classified as "failed."

  • abstract/mills/status.txt
  • Last modified: 2018-05-21 19:57
  • by sraskar