How to use input_features of different dimensions in heterogeneous graph link prediction

Hello, I am trying to build a link predictor for heterogeneous graph where every node_type and edge_type has different dimensions of features with dgl and Pytorch. I am following this doc. But it seems to be lacking a bit on how to incorporate node/edge features of different dimensions. Is there another doc for it? How do I incorporate node and edge features of different dimensions in heterogeneous graph link prediction?

Hi, there are a couple of ways to do so depending on which one is more suitable for your scenario.

  • You could first apply a linear projection layer to each node/edge type to turn all the features into same length. Below is a demo code to do so:
    # Suppose you have a heterograph with two types of nodes
    # "user" and "item".
    # Define a module like this
    class ProjectionLayer(torch.nn.Module):
        def __init__(self, in_sizes, out_size):
            self.layers = torch.nn.ModuleDict({
                'user' : torch.nn.Linear(in_sizes['user'], out_size),
                'item' : torch.nn.Linear(in_sizes['item'], out_size)})
        def forward(self, feats):
            # user and item features can have different lengths but
            # will become the same after the projection layer
            return {'user' : self.layers['user'](feats['user']),
                    'item' : self.layers['item'](feats['item'])}
    
    You could plug-in this module in your model and pass the initial feature to it.
  • If you’d like to use different hidden size for different node/edge type (it’s a little uncommon but may have some benefits), you could design a score function that works for different feature lengths. For example, you could modify the ScorePredictor to include a weight matrix and instead of calculating h_u^Th_v as in the guide you read, it calculates h_u^TWh_v. The weight matrix acts like a linear projection that turns a node feature vector to another length, quite similar to the first idea again.
1 Like

Thanks for the reply, but ‘pos_score’ and ‘neg_score’ according to the link returns dictionary values for all specific edges. Then should I just sum all the loss for different edges? or should there be another way to do it?

Summing them should be fine. Just be aware that your graph may have unbalanced type distribution so you may want to scale the loss with proper weights.

1 Like