Hi,
I use the code of relational GCN(pytorch version) to do link prediction(dgl/examples/pytorch/rgcn at master · dmlc/dgl · GitHub ), but I can’t understand some content in the code.
data = FB15k237Dataset(reverse=False)
g = data[0]
num_nodes = g.num_nodes()
num_rels = data.num_rels
train_g = get_subset_g(g, g.edata["train_mask"], num_rels)
test_g = get_subset_g(g, g.edata["train_mask"], num_rels, bidirected=True)
……
print("Testing...")
checkpoint = torch.load(model_state_file)
model = model.cpu() # test on CPU
model.eval()
model.load_state_dict(checkpoint["state_dict"])
embed = model(test_g, test_nids)
best_mrr = calc_mrr(
embed, model.w_relation, test_mask, triplets, batch_size=500
)
Why test_g needs to be obtained by using train_mask (test_g = get_subset_g(g, g.edata[“train_mask”], num_rels, bidirected=True)), and the subsequent training process and test set calculation process also use the test_g defined in this way?
I think test_g should be defined by “test_mask”, i.e. test_g = get_subset_g(g, g.edata[“test_mask”], num_rels, bidirected=True).
I think maybe there are some problems with my understanding?
Could someone help me with the question?
Thanks a lot!