Can DGL use pipeline parallelism?

gamdwk · August 7, 2023, 8:28am

I have seen in some papers that using pipeline parallelism to overlap sampling and computation. Is this feature included with DGL? At the same time, I paid attention to the parallel pipeline of ordinary machine learning, splitting mini-batches into micro-batches. There are examples of this in pytorch. Is it the same thing as the pipeline parallelism in GNN?

peizhou001 · August 9, 2023, 9:41am

Hi @gamdwk, for your questions:

Is this feature included with DGL?
— Yes, DGL definitely overlap sampling and computation.
Is it the same thing as the pipeline parallelism in GNN?
— No, for ordinary ML, pipeline parallelization usually denotes spliting the model into several stages, forming a pipeline, then each stage can deal with micro-batches simultaneously. There is a difference as sampling is not part of a model.

yofufufufu · August 15, 2023, 1:51pm

Hello, about pipeline parallelism in DGL, I think that DGL overlap sampling and computation only when using CPU sampling(if sampler option use_prefetch_thread is True).
And if GPU sampling is used, then use_prefetch_thread must be False, so can DGL overlap GPU sampling and computation? If so, how DGL achieves that, could you please show me the code snippet?
Looking forward to your reply!

yofufufufu · September 4, 2023, 5:43pm

Can anyone help? Or I would create a new topic.

peizhou001 · September 14, 2023, 1:57am

Sorry for the late reply! According to my knowledge, in DGL datalaoder, you can spawn different processes for training and sampling, so I think computation and communication can be overlapped, for example:
Process A : computing batch 1
Process B : sampling batch 2.

yofufufufu · September 14, 2023, 9:42am

Thanks for your reply!

Yes, I agree. But I think this feature only works when use_prefetch_thread option in DGL dataloader is true.
And when using GPU sampling, it can not be true:

github.com

dmlc/dgl/blob/47c6fb1ff0763d137313ca49aa9b4f1a7630b28a/python/dgl/dataloading/dataloader.py#L1075-L1079

      
        
            if use_prefetch_thread is True:
                raise ValueError(
                    "use_prefetch_thread=True is only effective when device=cuda and "
                    "sampling is performed on CPU."
                )

So when using CPU sampling, yes; But when using GPU sampling, DGL can not overlap computing and sampling, am I wrong?

peizhou001 · September 19, 2023, 4:31am

For one process, Yes.
But consider this perspective: there may be several processes involved in both computation and sampling. One computing process could obtain sampling results from another sampling process, implying a significant degree of overlap between them.

yofufufufu · September 19, 2023, 10:57am

Thanks for your reply!
In fact, I think this kind of overlapping on GPU can be implemented by CUDA stream, and I used to try to find some code snippets about that in DGL source code, but failed.
So to confirm again, DGL has not implemented this kind of overlapping on GPU?

peizhou001 · September 20, 2023, 3:35am

For one GPU computing and sampling, the only way to implement it is to use different CUDA streams, which need to be double confirmed. Otherwise there always have overlapping.
And more, CUDA steam is supported but in the case sampling in CPU and computing in GPU, refer https://github.com/dmlc/dgl/blob/c298223f5de09e9ee265cf6a9fb9145b692b4c5b/python/dgl/dataloading/dataloader.py#L842 for more details.

yofufufufu · September 20, 2023, 7:03am

Thanks for your help

peizhou001 · September 20, 2023, 7:52am

Confirmed there is no kennel overlapping between sampling and computation. And there are some additional input:

Even use multi CUDA stream can’t ensure parallelism, since SM resources could be occupied by one task.
Both 2 tasks are computation intensive tasks, it may be meaningless to overlap these 2 tasks.

yofufufufu · September 20, 2023, 8:20am

Thanks for your detailed reply!

system · October 20, 2023, 8:20am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.