How to use a hinge loss in Heterogeneous graph

lucaslrolim · October 22, 2020, 1:23am

It’s possible to implement a hinge loss function in DGL? I’m trying to implement a loss function similar Uber folks had implemented in the following article: https://eng.uber.com/uber-eats-graph-learning/

My goal is to implement a kind of triplet loss, where I sample the top-K and bottom-K neighbors to each node based on Personalized Pagerank (or other structural properties) and then use these triplets to calculate the loss.

I’m working on a link prediction problem and using a heterogeneous graph. I’m taking as base the code presented in the WWW20-hands-on-tutorial.

BarclayII · October 26, 2020, 2:43am

It is possible. If you intend to base on the WWW tutorial, you need to change two places:

In LinkPredictionMinibatchSampler, you can see that in Step 2 we compact two graphs: the positive graph consisting of all the positive edges pos_pair_graph, and the negative graph consisting of all the negative edges neg_pair_graph. Instead, you will have to compact three graphs: pos_pair_graph, neg_pair_graph, and another “low-rank-positive” graph pos_pair_graph2.In t

In the training loop

        for pair_graph, blocks in t:
            user_emb, item_emb = model(blocks)
            prediction = model.compute_score(pair_graph, user_emb, item_emb)
            predictions.append(prediction)
            ratings.append(pair_graph.edata['rating'])

Instead the iterator will yield pos_pair_graph, neg_pair_graph and pos_pair_graph2 along with blocks. After you compute user_emb and item_emb you can compute three score vectors:

            pos_score = model.compute_score(pos_pair_graph, user_emb, item_emb)
            neg_score = model.compute_score(neg_pair_graph, user_emb, item_emb)
            pos_score2 = model.compute_score(pos_pair_graph2, user_emb, item_emb)

You can then compute the loss based on these three scores vectors.

lucaslrolim · October 30, 2020, 12:30am

Thanks, @BarclayII !

Just one final question. It’s possible in each iteration (message passing) use a random walk to sample different neighbors (or edges) to take into consideration?

I´m looking at the Neighborhood Sample documentation, but it seems it sample the neighbors just on time and then keep using it. Is it correct?

BarclayII · October 30, 2020, 8:25am

No. The example code there (MultiLayerNeighborSampler or MultiLayerDropoutSampler) will sample a different set of neighbors for each layer for each iteration. It does not keep using the same set of neighbors all the time.

lucaslrolim · November 5, 2020, 4:39am

@BarclayII

In my use case, I have nodes type A and B in a bipartite graph. I would like that the massage pass occurs between A <-> B, but I also would like that my loss function compare the embeddings from nodes type A and the k most important neighbors of the same type of node and vice versa (using PinSAGE sampling for example).

The big difference in this approach is that I don’t want to build projection graphs. If, for example, I use PinSAGE in my data loader the message passing will be in the original graph or in the projected graph?

My use case:

1 - Build a full heterogeneous graph with node types A and B

2 - Use all the graph to message passing; in a way nodes type A will receive messages only from nodes type B, and nodes type B will receive only messages from nodes type A

3 - Compute a loss score based on margin loss, comparing the dot product of the representation of nodes of each type with the most similar nodes of the same type (PinSAGE sample) and negative examples.

My first try to solve it without success:

class NeighborhoodLoss():

  def __init__(self, graph):
    self.graph = graph

  def sample_positive_neighbors(self, target_node_type, aux_node_type):
    sampler = dgl.sampling.PinSAGESampler(
        self.graph,
        target_node_type, 
        aux_node_type, 
        3, 
        0.5,
        100,
        5 
    )
    seeds = torch.LongTensor(self.graph.nodes(target_node_type))
    frontier = sampler(seeds)
    nodes_ids = frontier.all_edges(form='uv')[0]
    neighbors_ids = frontier.all_edges(form='uv')[1]
    return nodes_ids, neighbors_ids

  def sample_negative_neighbors(self, target_node_type, neg_samples=3):
    n_nodes = graph.number_of_nodes(target_node_type)
    nodes_ids = torch.from_numpy(np.array(list(np.arange(0, n_nodes)) * neg_samples))
    neg_neighbors_ids = torch.randint(0, n_nodes, (n_nodes* neg_samples, ))
    return nodes_ids, neg_neighbors_ids
  
  def calculate_representation_dot_product(self, node_ids, neighbors_ids, target_node_type):
    nodes_representations = [self.graph.nodes[target_node_type].data['h'][node_id] for node_id in node_ids]
    neighbors_representations = [self.graph.nodes[target_node_type].data['h'][node_id] for node_id in neighbors_ids]
    n_rep_tensor = torch.FloatTensor(neighbors_representations)
    neihgh_rep_tensor = torch.FloatTensor(nodes_representations)
    dot_product = torch.dot(n_rep_tensor, neihgh_rep_tensor)
    return dot_product
  
  def get_dot_products(self, node_types):
    types = [node_types, [node_types[1], node_types[0]] ]
    loss = []
    for it in types:
      pos_nodes_ids, pos_neigh_ids = self.sample_positive_neighbors(it[0],it[1])
      neg_nodes_ids, neg_neigh_ids = self.sample_negative_neighbors(it[0])
      pos_dot_product = self.calculate_representation_dot_product(pos_nodes_ids, pos_neigh_ids, it[0])
      neg_dot_product = self.calculate_representation_dot_product(neg_nodes_ids, neg_neigh_ids, it[0])
      loss.append((pos_dot_product, neg_dot_product))
    return loss

Score function:

    def compute_score(self, pair_graph, user_embeddings, item_embeddings):
        with pair_graph.local_scope():
            pair_graph.nodes['user'].data['h'] = user_embeddings
            pair_graph.nodes['item'].data['h'] = item_embeddings
            ngh_loss = NeighborhoodLoss(pair_graph)
            scores = ngh_loss.get_dot_products(['item','user'])
            return scores

Loss function:

def compute_margin_loss(scores):
    pos_score = scores[0]
    neg_score = scores[1]
    return (- pos_score + neg_score + 0.1).clamp(min=0)

BarclayII · November 9, 2020, 7:53am

Seems that your code has the right direction and the only missing component is how to sample blocks from a given set of seed nodes. Thar requires you to use NodeCollator directly. Here is an example:

import dgl
import torch
# example graph
ss = torch.randint(0, 100, (300,))
dd = torch.randint(0, 100, (300,))
g = dgl.heterograph({('A', 'AB', 'B'): (ss, dd), ('B', 'BA', 'A'): (dd, ss)})

# custom collator that samples blocks from a given set of seed nodes
sampler = dgl.dataloading.MultiLayerNeighborSampler([2, 2])
collator = dgl.dataloading.NodeCollator(g, {'A': torch.arange(100)}, sampler)
# sample blocks from a given set of seed nodes.
# the seed nodes must be in the following type-ID pair format
seed_nodes = [('A', 0), ('A', 1), ('A', 2)]
input_nodes, output_nodes, blocks = collator.collate(seed_nodes)
print(blocks)
print(blocks[-1].dstnodes['A'].data[dgl.NID])    # should be the same as seed_nodes, i.e. [0, 1, 2]