Dear Discovery users,
As part of the annual maintenance schedule to keep Northeastern’s high performance computing infrastructure fast, secure, and up-to-date, the Discovery cluster
will be down June 6 – 14. This is scheduled in conjunction with the annual power shutdown at the MGHPCC facility that hosts Discovery. Please plan your jobs on the Discovery cluster accordingly to not overlap with this maintenance period.
There are a number of actions that you can take to prepare for this upgrade and that may help reduce the downtime. Please clean up as much of your /scratch space as possible
to aid the data transfer. Checkpointing, limiting the time jobs run with --time, and deleting any unnecessary log and output files generated by jobs will also help make the maintenance process easier and faster.
The improvements to the Discovery cluster that you can expect after this maintenance is complete include:
-
Upgraded scheduler (slurm), 19.5.0 from 17.11.6
-
Upgraded CentOS, 7.6 from 7.5
-
OS security updates installed
-
Increased /scratch space, 2.3 PB from 1.1 PB
-
A faster network for most of the compute infrastructure
You will be informed should there be any changes to this maintenance window. Updates will also be communicated via the Northeastern
ITS Status Page.
Thank you for your patience during this work to improve the university’s research computing environment,