Print

Print


Dear Discovery Cluster Users:

IF YOU DO NOT USE OR PLAN TO USE THE "par-gpu" QUEUE ON THE DISCOVERY CLUSTER PLEASE IGNORE THE REST OF 

THIS EMAIL

If you do use or plan to use the "par-gpu" queue on the Discovery Cluster please note and ensure the following without exception:

1) The "par-gpu" queue consists of 32 compute nodes, where each compute node has only one NVIDIA Tesla K-20m GPU.

2) Each compute node has 32 logical cores, hence 32 LSF compute slots per node. There are 32 compute nodes, and hence 32 GPU's in this queue.

3) When submitting and running interactive or batch jobs use all 32 cores in every compute node. This way LSF will close the node to other users. Every node is locked down and no one but you will use the GPU. If you do not do this LSF may assign another user to the same node who will then begin using the same GPU. You do not want this. For debugging, code development, test and production runs you want one or more GPU(s) for yourself.

4) For interactive jobs this is accomplished by the following:
 
    a) without X-11 forwarding: "bsub -Is -n 32 -R span[ptile=32] -q par-gpu /bin/bash"
 
    b) with X-11 forwarding: "bsub -Is -XF -n 32 -R span[ptile=32] -q par-gpu /bin/bash"
 
(Note -n above should be a multiple of 32 if you want more than one interactive node. After you exit check that your interactive session has ended using "bjobs -w" and if not please kill it manually using "bkill -r <jobid>" and recheck. Please also note time limit for this queue for both interactive and batch runs is 24 hours.)

5) For batch jobs the "#BSUB" directives must include all of the following in addition to the other ones:
 
    a) #BSUB -n 32 
        (or multiples of 32)
    b) #BSUB -R "span[ptile=32]"
    c) #BSUB -q "par-gpu"

(Note -n above should be a multiple of 32 if you want more than one node for a batch run. Please also note time limit for this queue for both interactive and batch runs is 24 hours.)

Thank you for your patience.

Best
Nilay

########################################################################

To unsubscribe from the DISCOVERY list, click the following link:
http://listserv.neu.edu/cgi-bin/wa?SUBED1=DISCOVERY&A=1