abstract:caviness:runjobs:runjobs

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Both sides previous revision Previous revision
Next revision
Previous revision
abstract:caviness:runjobs:runjobs [2019-10-19 15:39] – [Introduction] anitaabstract:caviness:runjobs:runjobs [2022-05-05 11:17] (current) – [Introduction] anita
Line 7: Line 7:
 Without a job scheduler, a cluster user would need to manually search for the resources required by his or her job, perhaps by randomly logging-in to nodes and checking for other users' programs already executing thereon.  The user would have to "sign-out" the nodes he or she wishes to use in order to notify the other cluster users of resource availability((Historically, this is actually how some clusters were managed!)).  A computer will perform this kind of chore more quickly and efficiently than a human can, and with far greater sophistication. Without a job scheduler, a cluster user would need to manually search for the resources required by his or her job, perhaps by randomly logging-in to nodes and checking for other users' programs already executing thereon.  The user would have to "sign-out" the nodes he or she wishes to use in order to notify the other cluster users of resource availability((Historically, this is actually how some clusters were managed!)).  A computer will perform this kind of chore more quickly and efficiently than a human can, and with far greater sophistication.
  
-Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.  Documentation for the current version of Slurm provide by SchedMD [[https://slurm.schedmd.com/documentation.html|SchedMD Slurm Documentation]].+Slurm is an open source, fault-tolerant, and highly scalable cluster management and job scheduling system for large and small Linux clusters.  Documentation for the current version of Slurm provided by SchedMD [[https://slurm.schedmd.com/documentation.html|SchedMD Slurm Documentation]].
  
-You may find it helpful when migrating from another scheduler to another such as Slurm to refer to SchedMD's [[https://slurm.schedmd.com/rosetta.pdf|rosetta]] showing equivalent commands across various schedulers. +You may find it helpful when migrating from one scheduler to another such as GridEngine to Slurm to refer to SchedMD's [[https://slurm.schedmd.com/rosetta.pdf|rosetta]] showing equivalent commands across various schedulers and their [[https://slurm.schedmd.com/pdfs/summary.pdf|command/option summary (two pages)]]
  
-<note tip>It is a good idea to periodically check in ''/opt/templates/slurm/'' for updated or new [[technical:slurm:templates:start|templates]] to use as job scripts to run generic or specific applicationsdesigned to provide the best performance on Caviness.</note>+<note tip>It is a good idea to periodically check in ''/opt/shared/templates/slurm/'' for updated or new [[technical:slurm:caviness:templates:start|templates]] to use as job scripts to run generic or specific applications designed to provide the best performance on Caviness.</note>
  
 Need help? See [[http://www.hpc.udel.edu/presentations/intro_to_slurm/|Introduction to Slurm]] in UD's HPC community cluster environment. Need help? See [[http://www.hpc.udel.edu/presentations/intro_to_slurm/|Introduction to Slurm]] in UD's HPC community cluster environment.
Line 35: Line 35:
 </note> </note>
  
 +It is important for jobs to run on compute nodes and not login nodes.  Without effective limits in place, a single user could monopolize a login node and leave the cluster inaccessible to others. Please review [[technical:generic:caviness-login-cpu-limit|Per-process CPU time limits on Caviness login nodes]] summarizing current resource limits and the need for and implementation of additional limits on the Caviness cluster login nodes.
 ===== Queues ===== ===== Queues =====
  
  • abstract/caviness/runjobs/runjobs.1571513994.txt.gz
  • Last modified: 2019-10-19 15:39
  • by anita