Training on Heterogenous Graph with multiple node types and different feature dimensions

Deb · September 22, 2021, 6:18pm

Hi,
Need advice on training graph algorithms on a heterogenous graph.

I have created a hetero graph with 2 nodes and 1 edge in DGL, all 3 feature tensors associated with them have different shape as follows

user - feature dimension - (3,3)
movie - feature dimension - (3,2)
user-movie - feature dimension - (4,2) . This has one edge with two attributes.

I want to train a GAT on the graph to use the edge attributes along with the node attributes. It seems that the feature input to GAT(in_dim=features.size()[1]) doesn’t allow node features of different dimensions (user has a dimension of 3 and movie has a dimension of 2). How to handle the different node feature dimensions?

The second question is how to pass edge features to GAT?

Thanks

BarclayII · September 24, 2021, 3:16am

You can first project the node features with different dimensions with an MLP to the same dimension.

GATs per se do not handle edge features. You will need to change the GAT module yourself. How do you plan to use the edge features (e.g. only for attention, or both attention and messages sent from neighbors, or else)?

Deb · September 24, 2021, 5:55am

So for the data given in the above example, pass the user features(three feature columns) to an MLP to get a n-dimensional embedding vector, for example and then pass the movie features to another MLP to get an n-dimensional vector. The resulting n-dimensional vectors for user and movie can then be used as initial node features in the graph.

GATs per se do not handle edge features. You will need to change the GAT module yourself. How do you plan to use the edge features (e.g. only for attention, or both attention and messages sent from neighbors, or else)?

I plan to use the edge features both for attention and message passing.

BarclayII · September 26, 2021, 4:18am

Yes, that’s how I would do it.

So you will need to change GATConv’s implementation.

This is the place you compute attention:

github.com

dmlc/dgl/blob/master/python/dgl/nn/pytorch/conv/gatconv.py#L302-L310

    
      
          el = (feat_src * self.attn_l).sum(dim=-1).unsqueeze(-1)
          er = (feat_dst * self.attn_r).sum(dim=-1).unsqueeze(-1)
          graph.srcdata.update({'ft': feat_src, 'el': el})
          graph.dstdata.update({'er': er})
          # compute edge attention, el and er are a_l Wh_i and a_r Wh_j respectively.
          graph.apply_edges(fn.u_add_v('el', 'er', 'e'))
          e = self.leaky_relu(graph.edata.pop('e'))
          # compute softmax
          graph.edata['a'] = self.attn_drop(edge_softmax(graph, e))

You can add another term to graph.edata['a'] which is computed from edge features.

This is the place where you compute a weighted average over the source node features:

github.com

dmlc/dgl/blob/master/python/dgl/nn/pytorch/conv/gatconv.py#L312-L313

    
      
          graph.update_all(fn.u_mul_e('ft', 'a', 'm'),
                           fn.sum('m', 'ft'))

You can also compute a weighted average over edge features like

graph.edata['weighted_edge_msg'] = graph.edata['a'] * graph.edata['edge_msg']
graph.update_all(fn.copy_e('weighted_edge_msg', 'm'), fn.sum('m', 'ft_edge'))