Negative Sampling in dgl.EdgeDataLoader

sopkri · April 12, 2021, 4:32pm

Hi there!

Considering the case that I have a heterograph split into training, test and validation edges, out of which I created subgraphs of the heterograph. When I pass the training graph to the EdgeDataLoader, does the EdgeDataLoader make sure somehow that the negative edges generated are not part of the other subgraphs of the heterograph (test and validation subgraph) ?

The reason I am asking this is that I created an RGCN (relational graph convolutional neural network) based on the DGL classes and when doing link prediction, I oftentimes receive an AUC below 0.5. This leads me to the question whether something in the model is not quite correct. Therefore I figured it might have to do with the data and the data labeling.
Considering that a negative edge generated by the EdgeDataLoader could possibly be a positive edge (from another subgraph of the heterograph), this could be the reason for a wrong labeling and therefore a low AUC.

I would be very happy to get some opinions on this and any help is welcome!

BarclayII · April 13, 2021, 7:11am

EdgeDataLoader does not ensure that negative edges are not part of the validation or test. In the literature of recommendation with implicit feedback I have seen both ways, i.e. one that ensures that they do not appear in the training/validation/test set, and the other that does not. In general they do not make much difference especially if the dataset is large.

If your dataset is small however, then they could make a difference. In this case, I would suggest using has_edges_between to determine whether an edge exist between two nodes in a graph. You can then mask the loss terms with the result from has_edges_between to ensure that the loss terms of those false negative examples do not backpropagate.

system · May 13, 2021, 7:12am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.