Hello Discovery users,

We were able to identify the issue with scheduling issue affecting Slurm on Discovery. Our team found that slurmctld was configured to defer scheduling of jobs if its outstanding workload was 32 threads or greater. Currently, a normal workload is approximately 50 threads, so slurmctld was deferring scheduling for new jobs. We have changed this configuration so that is no longer an issue. When we made this change, due to limitations of the scheduler, some jobs were killed.  We recommend that check your jobs and resubmit anything that is missing.

We apologize for any inconvenience that this created for you, and we thank you for your patience while we investigated this issue.

Thanks,

Julia



____________________________
Julia Cho
Technical Writer, Research Computing
Northeastern University
216 Mass Ave. Boston, MA 02115
(C) 617-999-6245 (O) 617-373-3906


To unsubscribe from the DISCOVERY list, click the following link:
https://listserv.neu.edu/cgi-bin/wa?SUBED1=DISCOVERY