Getting started on Mills (Retired)
The Mills cluster, UD's first Community Cluster, was deployed in 2012 and is a distributed-memory, Linux cluster. It consists of 200 compute nodes (5160 cores, 14.5 TB memory, 49.3 peak Tflops). The nodes are built of AMD “Interlagos” 12-core processors, in dual- and quad-socket configurations for 24- and 48-cores per node. A QDR InfiniBand network fabric supports the Lustre filesystem (approx 180 TB of usable space). Gigabit and 10-Gigabit Ethernet networks provide access to additional Mills filesystems and the campus network. The cluster was purchased with a proposed 5 year life, putting its retirement in the October 2016 to January 2017 time period.
The cluster was named to recognize the significant scientific and engineering contributions of Prof. Emeritus David L. Mills, Dept. of Electrical & Computer Engineering, UD.
For general information about the community cluster program, visit the IT Research Computing website. To cite the Mills cluster for grants, proposals and publications, use these HPC templates.
Login (head) node is online to allow file transfer off Mills' filesystems.
We must finally say goodbye to the Mills cluster. Thank you to all Mills cluster users for your cooperation and contributions to the UD research and HPC communities.
For complete details see Mills Retirement On August 21, 2023.
Configuration
Overview
An HPC system always has a public-facing system known as the login node or head node. The login node is supplemented by many compute nodes which are connected by a private network. Each compute node typically has several multi-core processors that share memory. Finally, the login nodes and the compute nodes share one or more filesystems over a high-speed network.
Login node
The login node is the primary node shared by all cluster users. Its computing environment is a full standard variant of UNIX configured for scientific applications. This includes command documentation (man pages), scripting tools, compiler suites, debugging/profiling tools, and application software. In addition, the login node has several tools to help you move files between the HPC filesystem and your local machine, other clusters, and web-based services.
If your work requires highly interactive graphics and animations, these are best done on your local workstation rather than on the cluster. Use the cluster to generate files containing the graphics information, and download them from the HPC system to your local system for visualization.
Compute nodes
There are no longer compute nodes.
Storage
Permanent filesystems
At UD, permanent filesystems are those that are backed up or replicated at an off-site disaster recovery facility. This always includes the home filesystem, which contains each user's home directory and has a modest per-user quota. A cluster may also have a larger permanent filesystem used for research group projects. The system is designed to let you recover older versions of files through a self-service process.
High-performance filesystems
One important component of HPC designs is to provide fast access to large files and to many small files. These days, high-performance filesystems have capacities ranging from hundreds of terabytes to petabytes. They are designed to use parallel I/O techniques to reduce file-access time. The Lustre filesystems, in use at UD, are composed of many physical disks using technologies such as RAID-6 to give resilience, data integrity, and parallelism at multiple levels. They use high-bandwidth interconnects such as InfiniBand and 10-Gigabit Ethernet.
High-performance filesystems, such as Lustre, are typically designed as volatile, scratch storage systems. This is because traversing the entire filesystem takes so much time that it becomes financially infeasible to create off-site backups. However, the RAID-6 design provides increased user-confidence by providing a high level of built-in redundancy against hardware failure.
Local filesystems
Each node typically has an internal, locally connected disk that is fast, but which does not use a parallel filesystem. Its capacity may be measured in terabytes. Part of the local disk is used for system tasks such memory management tasks which might include cache memory and virtual memory. This remainder of the disk is ideal for applications that need large scratch storage for the duration of a run. That portion is referred to as the node scratch filesystem.
Each node scratch filesystem disk is only accessible by the node in which it is physically installed. Often, a job scheduling system, such as Grid Engine, creates a temporary directory associated with your job on this filesystem. When your job terminates normally, the job scheduler automatically erases that directory and its contents.
Software
IT-managed software: A list of installed software that IT builds and maintains for Mills users.
Documentation for all software is organized in alphabetical order on the sidebar.
Help
System or account problems, or can't find an answer on this wiki
If you are experiencing a system related problem, first check Mills cluster monitoring and system alerts. To report a new problem, or you just can't find the help you need on this wiki, then submit a Research Computing High Performance Computing (HPC) Clusters Help Request and complete the form including Mills and your problem details in the description field.
Ask or tell the HPC community
hpc-ask is a Google group established to stimulate interactions within UD’s broader HPC community and is based on members helping members. This is a great venue to post a question about HPC, start a discussion, or share an upcoming event with the community. Anyone may request membership. Messages are sent as a daily summary to all group members. This list is archived, public, and searchable by anyone.
Publication and Grant Writing Resources
HPC templates are available to use for a proposal or publication to acknowledge use of or describe UD’s Information Technologies HPC resources.