RGCN Exception on Link Prediction

I was trying to use RGCN implementation for link prediction, however, it does not work.
The problem arises when I try to evaluate the model. In the case of FB15k-237, it works, however, with wn18 it does not. Namely, it crashes in the function perturb_and_get_rank. This issue is exactly the same as in this question, however, I am using built-in dataset instead of a custom one.

The main issue, I think, is the way number of batches is calculated: n_batch = (num_entity + batch_size - 1) // batch_size
The comment says that we want to perturb one element in the triples, however, the number of triples is different from the number of entities. Therefore, in the case when num_entity < num_triples, we would skip some triples or do empty iterations in the case when num_entity > num_triples.

Could you, please, comment on why the number of batches is calculated this way?
Also, could you, please, explain why perturbation leads to a correct calculation of MRR?

Thanks.

Hi,

This seems a bug. Could you raise an issue at our github issue list? And please also provide more information about the target parameter in target.view(-1, 1).

Thanks for your help!

Thank you for your reply!

Do you mean the whole way of calculating batch size is a bug or specifically the way target.view(-1, 1) works is a bug?

I figured out a fix, so it might be related to the PyTorch itself and not to DGL. The fix is to use target.reshape(-1, 1) instead of view. Also, this problem arises only in the second run, which I find surprising. What I mean is following:

# perturb subject - this works fine
 ranks_s = perturb_and_get_rank(embedding, w, o, r, s, num_entity, eval_bz)
# perturb object - this breaks upon first empty target
ranks_o = perturb_and_get_rank(embedding, w, s, r, o, num_entity, eval_bz)

Information on target parameter from vscode debugger:

tensor([], dtype=torch.int64)
data:tensor([], dtype=torch.int64)
device:device(type='cpu')
dtype:torch.int64
grad:None
grad_fn:None
is_cuda:False
is_leaf:True
is_quantized:False
is_sparse:False
layout:torch.strided
name:None
output_nr:0
requires_grad:False
shape:torch.Size([0])
_backward_hooks:None
_base:tensor([[29666,    17, 23080],\n        [37606,     8,  5371],\n        [ 9631,    15,  3711],\n        ...,\n        [18235,    12,  4842],\n        [28112,     0, 40341],\n        [28502,    12, 22178]])
_cdata:94160866827968
_grad:None
_grad_fn:None
_version:0

@lingfan for visability.

Hi @askliar, I agree the code here is problematic. Allow me some time to look into this and I will get back to you ASAP.

Thank you @lingfan, let me know if I can help in any way. Also, if you have any references for how to properly measure MRR using perturbations, could you, please, share them?

Hi @askliar, I think you are right about the bug. num_entity here should instead be number of triplets.

I’ve never thoroughly investigated how people calculated MMR. I think I followed someone else’s implementation (probably RGCN author’s) and rewrote based on my understanding. But you can definitely look into any paper that uses MMR as performance metric.

I agree that the evaluation code is tricky to understand. My idea is:
the way you calculate MMR is for each triplet (s, r, o) (which is the ground truth), you try all possible o' \in [0, num\_nodes) , and calculate the score of (s, r, o’). That score is considered the “possibility” of (s, r, o’). Then you sort those scores to determine the rank of ground truth o. You also need to do the same thing for subject. Then you can calculate MMR based on its definition.

However, the challenge here is how you do this efficiently.

  • for each triplet, you need to perturb both subject and object,
  • you need to try all possible nodes from node 0 to num_node -1 when you do the perturbation
  • doing everything all at once will probably leads to out-of-memory issues

Here I exploited the fact that when you perturb subject, object and relation remain unchanged. And this also holds when you perturb object. Let’s say we are perturbing object. So what I do here is partition the triplets into mini-batches, and for each mini-batch, I first calculate the embedding of subject element-wise multiplying embedding of relation. And then I calculate the scores for each possible o’ by doing an out-product with the node embedding.