DGL classic GCN Computation

Hi,
I tried to train the GCNexample in the github (“dgl\examples\pytorch\gat”) and it works fine. Now i tried to launch several times the training (like 2 or 3 times) and, in opposite to Torch experiments, the execution time of the 3 experiments are 3 times longer than expected ?

Single training on a Quadro P1000 :

  • [00:03<00:00, 76.79it/s]

Launch in // 2 times :

  • 250/250 [00:05<00:00, 43.31it/s]
  • 250/250 [00:05<00:00, 43.95it/s]

The computation time is 2 times the computation time of the single training, but in // on a single GPU, it is supposed to be the same ?

Thanks for your help,

  • dgl\examples\pytorch\gat is an example for Graph Attention Network (GAT).
  • What do you mean by “Torch experiments”?
  • By “2 or 3 times”, did you mean multi-GPU training?
  • What’s the source of the expected execution time?
  • How did you get the time?

Hi Mufeili, thanks for your answer.

Yes, i mean dgl\examples\pytorch\gcn (i tried also on gat and same observation).

  • By “torch experiments”, i mean an toy gcn model extracted from torch example on github that i launched one time on one GPU (gpu:0) and 2 time on (gpu:0) in //.

  • What’s the source of the expected execution time?
    I compare the execution time between :

    • One single training executed on GPU (gpu:0) : 250/250 [00:03<00:00, 76.79it/s]
    • Two training executed on GPU (gpu:0) : 250/250 [00:05<00:00, 43.31it/s]; 250/250 [00:05<00:00, 43.95it/s]
  • By “2 or 3 times”, did you mean multi-GPU training? :
    When i said 2 or 3 times, i meant that i did “python train.py --n-epoch 250” 3 time on the same GPU (gpu:0), so the model are independent and are executed in // on the same GPU (gpu:0). I measure the time with tqdm in the epoch loop.

So here my problem is : Why when i am executing 2 GCN training using DGL library on the same GPU, the execution time is twice as long as when I execute a single one while, with the same model (GCN) on pytorch, the execution time is the same if i launch 1 training or 2 training in // on the same GPU ?

Maybe it is something with cuda stream properties ? I read that is was not handle now by DGL ?

Thanks a lot

One possibility is that DGL empowers better GPU utilization and as a result the two experiments compete with each other when using the same GPU. Do you have the numbers for the pure PyTorch experiments?

Hi,

Yes it’s approximately the same ~3s, ~80it/s. And in PyTorch, it is the same (~3s, ~80it/s) when i do 3 parallelized training on the same GPU.
Is it a way to split the GPU utilization with DGL to not have each training competes with each other using the same GPU ?

Thanks,

Re-updating the thread: this bug which still occure in the last version.
Seems to come from CUDA integration in the back-end, where competition occured as said previously ? I didn’t find another solution since last december than not executing two DGL training in the same computer.

Thanks,

1 Like