I am learning about GCNs and DGL seems to be a very interesting framework to try multiple convolutional methods. My current objective is to run some models (such as GAT and GraphSAGE) on larger datasets (PPI and Reddit, for instance).
The PPI dataset contain multiple graphs, so the mini-batch can be done on a graph-level, such as in https://github.com/dmlc/dgl/blob/master/examples/pytorch/gat/train_ppi.py
Particularly for the Reddit, I would need some mini-batch implementation on node-level that is able to split the graph for each batch in such a way that:
- Sample seed nodes for each mini-batch
- Sample required neighbours from each seed node, based on the model requirements (including a predefined sample number of 2-hop neighbours, for instance).
It seems to me that using DGL’s LayerSampler would be enough for this task, however I am not sure how I could use the NodeFlow returned by LayerSampler in a PyTorch DataLoader for the mini-batch training. Is there any example similar to what I need to do?
I don’t have much experience with PyTorch or DGL, so I appreciate your help on this issue!