Inductive Deep Graph Infomax

lmtoan · March 1, 2020, 7:10am

Hi, I want to modify the DGI example for inductive settings with PPI/Reddit datasets. Would appreciate any pointer on how to start.

Thanks!

Lgcsimoes · March 1, 2020, 7:02pm

Hi! I think the major modification you would need is to implement a mini-batcher, so you can perform a subsample of the PPI/Reddit datasets so they can fit in memory (using mini-batches).

Perhaps the PPI dataset can fit in memory if you separate batches of complete graphs (such as implemented here), however the Reddit is composed of a single graph, which could be split using DGL’s NeighborSampler (as shown here). Even so, I understand that NeighborSampler has some limitations that DGL’s mainteners are planning to overcome by implementing a new feature. Maybe they can comment more on that.

BTW, if you plan to follow the paper’s implementation, you will of course need to change the neural network architecture accordingly (the transductive cases were solved using a one-layer GCN model).

mufeili · March 2, 2020, 7:48am

In addition to the great suggestions by @Lgcsimoes, you may find our example of GAT on PPI to be helpful: https://github.com/dmlc/dgl/blob/master/examples/pytorch/gat/train_ppi.py

lmtoan · March 4, 2020, 9:25pm

Thanks a lot for the recommendations. Since I want to generate embeddings for new nodes attaching the original graph during inference, not necessary a completely new separate graph like PPI, I think I’m leaning toward the Reddit dataset more.

The NeighborSampler is interesting. Is it able to generate overlapping subgraphs for train/test? For my problem, at test time, I have a small test graph consisting of old + new nodes to modify or generate new embeddings. Edges between old nodes might already be seen in training, but edges between new nodes or old-new nodes are unseen.

On a separate note, if I have a N-ary tree dataset similar to Tree-LSTM tutorial (except nodes can belong to multiple trees), can I pick my seed_nodes = root and num_hops = tree-depth, to batch-train my DGI, with each batch being a single tree?

Thanks again!

mufeili · March 5, 2020, 12:04pm

For the temporal graph scenario you describe, I think it will be easier to directly use two graphs separately for training and test. You can then use NeighborSampler to separately sample from them during training and test.
I’m not very familiar with Tree-LSTM. @zihao Any insight?

lmtoan · March 5, 2020, 5:34pm

For 1., could you elaborate a bit more on that? Originally I was thinking of setting up overlapping train_nid and test_nid (since a node can appear in both) and use those as seed_nodes in NeighborSampler.

mufeili · March 5, 2020, 5:44pm

With NeighborSampler, you cannot avoid sampling particular edges from a graph (e.g. the edges between old-new nodes). Therefore I recommend you construct one graph for training and one for test. The training graph consists of old nodes only.