Hi! I’m new to Graph ML and GNNs and trying to perform link prediction with neighbor sampling on bipartite undirected transaction graph consisting of credit card and merchants nodes with edges are transactions as shown in the figure. The goal is to predict which credit cards and merchants should be connected. To make the heterograph undirected I add the reverse edge like t
his:
dgl.heterograph({('card', 'transaction', 'merchant'): (card_nodes, merch_nodes), ('merchant', 'transaction-rev', 'card'): (merch_nodes, card_nodes)})
I’m trying to follow the tutorial here 6.3 Training GNN for Link Prediction with Neighborhood Sampling — DGL 0.6.1 documentation but I have few questions:
-
I’m confused whether I should be trying to predict just the transaction edge or both transaction and its reverse edge. Thus I’m confused about whether I should be passing only
('card', 'transaction', 'merchant')
edge intrain_eid_dict
or should I also pass in the reverse edge('merchant', 'transaction-rev', 'card')
in theEdgeDataLoader
. -
Also, I need to do negative sampling since this is link prediction and I want the negative edges to be between a card and a random merchant that are not actually connected in the graph and similarly between a merchant and a random card that are not connected. Will using
dgl.dataloading.negative_sampler.Uniform
accomplish this? -
I have also been going through EdgeDataLoader doc and the examples there dgl.dataloading — DGL 0.6.1 documentation where they exclude the reverse edge in heterograph. If I’m trying to do undirected link prediction should I be excluding the reverse edge?Any help/guidance here would be really appreciated! Thanks!
-
Finally, I want to do train-validation-test split but not sure how I should proceed. DGL docs don’t seem to have an example for doing the split for link prediction with neighbor sampling. I suppose I could randomly sample 80% of edges as training, 10% validation and 10% test but again how to deal with the reverse edge for my undirected link prediction is what I’m confused about. Also it seems like I should be using
g_sampling
somehow for the validation and test set but not sure how because there is no example in the DGL docs. Can someone please guide on this?