Disabling sampling in distributed GraphSAGE

l-hoang · November 12, 2020, 9:29pm

Hello.

I am currently experimenting with the distributed GraphSAGE code provided by DGL. In order to compare with some other systems that do not do sampling, is it possible to easily change the code or pass in some argument so that the neighborhood sampling done in epochs is disabled?

In other words, I want an epoch in DGL’s GraphSAGE to only do 1 forward/backward step using the entire training set.

Thanks,
Loc Hoang

VoVAllen · November 14, 2020, 3:38pm

Sure. You just need to use the whole graph as the input when calculating the logits = model(g, input), g is sampled in the example, and you use the whole graph to substitute the g here.

VoVAllen · November 14, 2020, 3:39pm

And why you want to use distributed version if you can load the whole graph into the memory?

VoVAllen · November 16, 2020, 7:31am

You can also refer to our inference example on the full graph https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/experimental/train_dist.py#L84

l-hoang · November 16, 2020, 6:49pm

Thanks for the information!

I’m doing distribution to improve compute time rather than to get more memory since the computation in a GNN is quite high.

If I have any more questions I’ll let you know.

Loc