Dear Discovery Cluster Users:
If you do not use the /scratch folder on the Discovery Cluster please ignore this email message.
[root@discovery1 admin]# df -ha /scratch
Filesystem Size Used Avail Use% Mounted on
mghpccnfs1.nunet.neu.edu:/ifs/mghpcc/nfs/research1_scratch
50T 46T 4.8T 91% /scratch
[root@discovery1 admin]#
Please note if /scratch gets to be 100% full the cluster will go down and all user jobs will stop as compute and login nodes will hang. I will have to restart the entire cluster, and everything associated with it - admin and login nodes, 300 + compute nodes, scheduler, network, licensing, HDFS etc. This will involve a downtime of a couple of days at least, may be longer - I will have to spend the next week in Holyoke, MA bringing the cluster back to full operation status.
List of users that are using over 200-300GB on /scratch follow in an email shortly, as the script to run usage itself has slowed due to the /scratch partition being full, and the partition itself is running very slow.
I may have to close the cluster queues and login nodes working with the high usage users - moving data out of the cluster - until we get to a reasonable /scratch usage level. Anything above 200-300GB is considered excessive usage unless you have informed us about this ahead of time. You are required to move your files after your work is done and it should not remain on /scratch for over 2-3 weeks. /scratch is not for long term storage.
Users above 200-300GB /scratch usage will be held responsible should it get to the stage where the cluster goes down.
I hate to enforce storage limits on /scratch, but that may be the only option left. With over 450 users on the cluster please use /scratch responsibly. If we do impose /scratch limits no user will get more than 75GB - a pity.
Thank you for your patience and continued cooperation in this matter. I hope all users understand the critical situation here.
Best
Nilay
########################################################################
To unsubscribe from the DISCOVERY list, click the following link:
http://listserv.neu.edu/cgi-bin/wa?SUBED1=DISCOVERY&A=1
|