technical:generic:farber-microcode-201904

Differences

This shows you the differences between two versions of the page.

Link to this comparison view

Next revision
Previous revision
technical:generic:farber-microcode-201904 [2019-04-10 11:56] – created freytechnical:generic:farber-microcode-201904 [2019-04-23 11:57] (current) – [Mitigation] frey
Line 29: Line 29:
 ===== Mitigation ===== ===== Mitigation =====
  
-All compute nodes in Farber will need to be rebooted in order to apply the microcode update to the processors.  There are two options to effecting this change:+All compute nodes in Farber will need to be rebooted in order to apply the microcode update to the processors.  A staged reboot procedure will be used:
  
-==== Scheduled Maintenance ====+  * All queues on all nodes will be disabled.  Jobs currently running on a node will continue running, but no additional jobs will start on the node. 
 +  * Once all jobs running on a node have completed, the node will be rebooted. 
 +  * Once the node is online again, its queues will be restored to their previous state and jobs can again run on it.
  
-Users would be asked to clear all jobs from the queues prior to the maintenance time.  During a short (1 to 2 hours) maintenance window every compute node would be rebooted. +At 9:00 the morning of **April 29, 2019**, this staged process will commence.  Users wishing to accelerate the pace at which the procedure completes are invited to kill any jobs running at that time, but doing so is not mandated.  In particular, users should exit any open ''qlogin'' sessions as soon as possible.  An announcement will be sent via email when the procedure begins and when it completes Since the procedure could take some time to completeperiodic updates may be sent by IT staff.
- +
-|**PROS**|All nodes transition to having the microcode update at once| +
-|**CONS**|Downtime is required| +
- +
-==== Staged Reboots ==== +
- +
-High-priorityexclusive-access jobs would be submitted by IT targeting every node The job would simply reboot the node. +
- +
-|**PROS**|No downtime required| +
-|**CONS**|Presence of microcode update is in flux for an undefined period of time|+
  
 +<note important>IT staff may also contact owners of long-running jobs if the procedure is taking **longer than 1 week** to discuss whether or not the job(s) could be terminated.</note>
  
  • technical/generic/farber-microcode-201904.1554911762.txt.gz
  • Last modified: 2019-04-10 11:56
  • by frey