How to combine DGL with torch.nn.DataParallel?

jiayouwyhit · March 3, 2019, 3:48am

I am trying to use torch.nn.DataParallel (https://pytorch.org/tutorials/beginner/blitz/data_parallel_tutorial.html) to speed up the training of my TreeLSTM model in the tutorial (https://docs.dgl.ai/en/latest/tutorials/models/2_small_graph/3_tree-lstm.html). The major parts I changed are as follows:

=======
device = torch.device(“cuda:0” if torch.cuda.is_available() else “cpu”)
print('We have ', torch.cuda.device_count(), ‘GPUs!’)

model = TreeLSTM(trainset.num_vocabs, x_size, h_size, trainset.num_classes, dropout)
model = torch.nn.DataParallel(model)
model.to(device)

=======

But I always got the following error:

======

AttributeError: ‘tuple’ object has no attribute ‘graph’
(which points to this line of code: “g = batch.graph”)

======

Any suggestions or comments on this issue?

VoVAllen · March 3, 2019, 8:09am

Hi, you can refer to this issue. We recommend using torch.distributed rather than DataParallel. We will make a tutorial for multi-gpu training soon.

jiayouwyhit · March 4, 2019, 4:46am

@VoVAllen . Thanks a lot, Allen. Would you mind providing a brief explanation about why DGL is better to work with torch.distributed rather than DataParallel ? I am not quite clear about this.

VoVAllen · March 4, 2019, 1:23pm

The key reason is we find pytorch didn’t release GIL properly when computation load is light. For example, if your computation is super fast, even if you use DataParallel (multithreading) with 4 GPU for certain batch size, you still spend the same time comparing to using 1 GPU (ideally should be 1/4).

Therefore we recommend using torch.distributed (multi processing) to do the multi GPU training.

jiayouwyhit · March 4, 2019, 3:48pm

Allen, thanks a lot for your clarification!