Pinsage example implementation different than the paper


I have been looking into PinSage Pytorch example code and noticed that there are some differences between the implementation and the paper.

Regarding generating embedding vectors for nodes, paper provides the following “algorithm” (shown in the screenshot)

There, you can see after applying two GCN layers (‘CONVOLVE’ operations), another fully connected dense layer is applied (Line 15 to Line 17 in the screenshot).

In the code, I see the following implementation which I am not able to map any step in the paper. Especially I can’t understand the purpose of “return h_item_dst + self.sage(blocks, h_item)” line in get_repr function. Why are the node features of “DST” nodes are added to output of GCN layers?

class PinSAGEModel(nn.Module):
def __init__(self, full_graph, ntype, textsets, hidden_dims, n_layers):

    self.proj = layers.LinearProjector(full_graph, ntype, textsets, hidden_dims)
    self.sage = layers.SAGENet(hidden_dims, n_layers)
    self.scorer = layers.ItemToItemScorer(full_graph, ntype)

def forward(self, pos_graph, neg_graph, blocks):
    h_item = self.get_repr(blocks)
    pos_score = self.scorer(pos_graph, h_item)
    neg_score = self.scorer(neg_graph, h_item)
    return (neg_score - pos_score + 1).clamp(min=0)

def get_repr(self, blocks):
    h_item = self.proj(blocks[0].srcdata)
    h_item_dst = self.proj(blocks[-1].dstdata)
    return h_item_dst + self.sage(blocks, h_item)


Indeed my implementation is a bit different than the paper, mainly because the dataset I was working on is too different than Pinterest’s.

The CONVOLVE operations are implemented in layers.SAGENet module.

Line 15 to 17 is simply projecting PinSAGE output with an MLP as the final item representations.
However, when I compute item representations, I was adding the PinSAGE output (self.sage(blocks, h_item)) with the seed nodes’ own embeddings (h_item_dst). Also, note that the PinSAGE paper does not assign each item a learnable embedding, while in my implementation each node does get a learnable embedding. This is mainly because I found it working better on traditional recommender system datasets like MovieLens or Nowplaying-RS, where the features are not as rich and distinguishing as Pinterest’s.

The relevance score of an item pair is computed by self.scorer().

Another difference between the paper and my implementation is that in PinSAGE paper, they additionally have a labeled set of which item pairs are relevant, and they train the model in a supervised fashion. Instead, traditional recommender system datasets do not have such labels, so I’m adapting it to an unsupervised model where I try to predict whether two items are co-interacted by a user or not.

Hopefully that addresses your question. I’ll also note the difference from the paper in the README document.