We have a couple of changes in preparation for a brief maintenance that is planned one week from today, October 31.
1. GPU Driver Updates for allowing the use of CUDA 9 – this item updating requires the updating of drivers on GPU nodes, and rebooting of all GPU nodes so that all users can work with CUDA 9. To minimize impact on jobs we are scheduling a reservation window, so that GPU jobs submitted prior to the reservation window will be held until we update all hardware. CUDA 9 will be available afterwards. This should minimize GPU job disruption but please be aware of this downtime period.
2. Partition consolidation – ‘bw-gpu’ and ‘bw-parallel’ partitions will consolidate to ‘gpu’ and ‘parallel’ partitions, which will improve utilization for all users. FYI, the ‘bw’ stands for Intel Broadwell, as MARCC has spans Ivy Bridge to Broadwell Intel architectures:. Next week, we will provide a blog post on using ‘constraints’ in SLURM batch scripts to provide access to either ‘haswell’ or ‘broadwell’ nodes. If as a user you have no preference, this should not affect your job submissions, but understand that we have configured SLURM to make a default selection.
The MARCC Team