Negative Sampling in hetero-RGCN for Link Prediction

Hi, very helpful library!

I’m a newbie, and trying to use hetero-RGCN (implemented by dgl.nn.pytorch.conv.RelGraphConv) to do the link prediction task.

I load my custom data into DGLHeteroGraph with several node types and edge types, and would like to predict the probability of existence of certain one edge types. To train the model, I need edge sampler. Is there a negative sampler API that can manipulate DGLHeteroGraph directly? Would you mind giving an example?

I noticed dgl.data.LinkPredDataLoader from link seems solves the problem. Is that a part of features in DGL v0.5?

Thank you very much! :grinning:

You can look at code here:

As you are only doing neg sampling of only one edge type. You can simply the code a lot.

Hi, I am using your code to do hetero graph link prediction in mini-batch mode.

I have some question on your negative sampling implementation. It seems that all negative heads and tails are picked randomly from all nodes in heterograph regardless of the src type and dst type of the edge. Since the distribution of the node type is extremely imbalanced, such negative sampling strategy may sample out many easy samples which cannot learn a better representation of the relation.

I have also noticed that there is a chunk size hyper param for sharing negative edges in positive pairs when calculating neg score which is designed for speeding up the training / validation process and indirectly increase the negative sample size I guess.

Any suggestion or example code on how to sample high quality negative samples with type constraints without slowing down the training time? @classicsong

That sampler is mainly designed for konwledge graphs (only one node type).
If you care more about the model performance, I suggest you to use the dgl.dataloading.EdgeDataLoader

And there are some examples there.

As this is using outdated code I’m closing this thread. Please make a new topic if you have further questions. Thanks!