Differences between RGCN implementation and the paper


I’m currently reading the paper and RGCN implementation and I found a few differences. I want to double check if I need to add them myself to get the original RGCN or if it’s already there and I’m missing something.

There is a W^{(l)}_{0} in the paper which I can’t find it in the code.
There is a sigmoid (non-linearity) in the paper.

Can you please provide more info on these?

Hi @bfatemi,

The RGCN implementation in DGL matches the author’s code with best efforts. The two points you mentioned are both implemented in the model.

The W^{(l)}_{0} in equation (2) is implemented as the loop message. See code here.

The sigmoid is also implemented in the link prediction example. The line here uses binary_cross_entropy_with_logits function from PyTorch, and if you check out PyTorch’s doc, it says:

This loss combines a Sigmoid layer and the BCELoss in one single class.

1 Like

On a related note, I’m confused with the code used to calculate scores for ranking, specifically in the perturb_and_get_rank() function,

batch_a = a[batch_start: batch_end]
batch_r = r[batch_start: batch_end]
emb_ar = embedding[batch_a] * w[batch_r]
emb_ar = emb_ar.transpose(0, 1).unsqueeze(2) # size: D x E x 1
emb_c = embedding.transpose(0, 1).unsqueeze(1) # size: D x 1 x V
# out-prod and reduce sum
out_prod = torch.bmm(emb_ar, emb_c) # size D x E x V
score = torch.sum(out_prod, dim=0) # size E x V
score = torch.sigmoid(score)

This seems like it’s calculating a score for each of the possible o in (s, r, o), which is predicting entities rather than links. In order to perform link prediction, I imagine that you would have to do something like this instead,

emb_a = entity_embedding[batch_a].unsqueeze(2) # s: batch_size x hidden_dim x 1
emb_c = entity_embedding[batch_c].unsqueeze(2) # o: batch_size x hidden_dim x 1
emb_r = relation_embedding.t().unsqueeze(0) # r: 1 x hidden_dim x num_relations
scores = torch.sum(emb_a * emb_r * embed_c, dim=1) # batch_size x num_relations

Which would then, for each object/subject pair, give you all of the scores of relations similar to the calc_score() function.

Is this right?

What’s the batch_a and batch_c in your suggestion?

I would imagine that in your code, the cartesian product of batch_a and batch_c would cover all possible subject node and object node pairs right? Then the problem is what metric would you use for evaluation after you get all those scores.

The paper uses MRR and HITS as evaluation metric (which is also used in DistMult paper), which I guess is the reason for making the link prediction more like entity prediction.

The batch_a and batch_c are the subject and object, or entity 1 and entity 2, from the triplets. For example, if the input data is,

[[e1, r2, e2], 
 [e3, r4, e4], 
 [e5, r6,e6], 

Then batch_a would be [e1,e3,e5], and batch_b would be [e2,e4,e6].

So for each triplet, you get num_rel number of scores, calculated the same way as the loss,

# how the loss is calculated in the current RGCN code
e1 = entity_emb[triplets[:, 0]]
r = self.rel_W[triplets[:, 1]]
e2 = entity_emb[triplets[:, 2]]
score = torch.sum(e1 * r * e2, dim=1)

Each of the triplets corresponding to a relation, and you would rank them the exact same way the code current ranks all of the entities and calculate MRR/hits.

For example, to rank,

# same as current ranking code
_, indices = scores.sort(dim=1, descending=True)
match = indices == label.view(-1, 1)
ranks = match.nonzero()

return ranks[:, 1] + 1  # index from nonzero() starts at 0

And to calculate MRR,

rr = 1.0 / ranks.float()
return rr.mean()

This way you get the MRR for the relations instead of the entities, which makes more sense to me as “link prediction”.

Let me know if this makes sense, or if I’m missing something. Thanks!

Are you suggesting for each positive sample, we only try different relation type? If that’s case, if there is currently no link between two entities in the knowledge, would you be able to find out any potential relation between these two entities?

Yeah, I think so. For training, you are maximizing the score of the specific triplet combination, right? So it makes sense to just source and destination with all of the existing relations during inference. So for a given pair of source and destination nodes, you have a list of scores, with the length of the list being the same as the number of relations. Because the currently implemented inference algorithm looks and ranks the destination node, which isn’t really doing link prediction.

I’m currently slowly going through the author’s code, and haven’t gotten to that part yet, but from the paper, that seems to be what they’re suggesting as well.

The decoder reconstructs edges of the graph relying on the vertex representations; in other words, it scores (subject, relation, object)-triples through a function s : Rd × R × Rd → R.