A problem with R-GCN for link prediction

Hi,
I use the code of relational GCN(pytorch version) to do link prediction(dgl/examples/pytorch/rgcn at master · dmlc/dgl · GitHub ), but I can’t understand some content in the code.

data = FB15k237Dataset(reverse=False)
g = data[0]
num_nodes = g.num_nodes()
num_rels = data.num_rels
train_g = get_subset_g(g, g.edata["train_mask"], num_rels)
test_g = get_subset_g(g, g.edata["train_mask"], num_rels, bidirected=True)
……
print("Testing...")
checkpoint = torch.load(model_state_file)
model = model.cpu()  # test on CPU
model.eval()
model.load_state_dict(checkpoint["state_dict"])
embed = model(test_g, test_nids)
best_mrr = calc_mrr(
    embed, model.w_relation, test_mask, triplets, batch_size=500
)

Why test_g needs to be obtained by using train_mask (test_g = get_subset_g(g, g.edata[“train_mask”], num_rels, bidirected=True)), and the subsequent training process and test set calculation process also use the test_g defined in this way?
I think test_g should be defined by “test_mask”, i.e. test_g = get_subset_g(g, g.edata[“test_mask”], num_rels, bidirected=True).

I think maybe there are some problems with my understanding?
Could someone help me with the question?
Thanks a lot!

The problem is that you need a graph for message passing in order to obtain node representations for link prediction. However, the graph should not contain the new edges you want to predict. Otherwise, this leads to information leakage. Therefore, the graph used for message passing during the test should not contain edges indicated by test_mask.

Okay, thank you for your answer. I’ll think about it again.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.