How to use input_features of different dimensions in heterogeneous graph link prediction

tanvirbKash · January 11, 2022, 5:41am

Hello, I am trying to build a link predictor for heterogeneous graph where every node_type and edge_type has different dimensions of features with dgl and Pytorch. I am following this doc. But it seems to be lacking a bit on how to incorporate node/edge features of different dimensions. Is there another doc for it? How do I incorporate node and edge features of different dimensions in heterogeneous graph link prediction?

minjie · January 12, 2022, 3:34am

Hi, there are a couple of ways to do so depending on which one is more suitable for your scenario.

You could first apply a linear projection layer to each node/edge type to turn all the features into same length. Below is a demo code to do so:

# Suppose you have a heterograph with two types of nodes
# "user" and "item".
# Define a module like this
class ProjectionLayer(torch.nn.Module):
    def __init__(self, in_sizes, out_size):
        self.layers = torch.nn.ModuleDict({
            'user' : torch.nn.Linear(in_sizes['user'], out_size),
            'item' : torch.nn.Linear(in_sizes['item'], out_size)})
    def forward(self, feats):
        # user and item features can have different lengths but
        # will become the same after the projection layer
        return {'user' : self.layers['user'](feats['user']),
                'item' : self.layers['item'](feats['item'])}

You could plug-in this module in your model and pass the initial feature to it.

If you’d like to use different hidden size for different node/edge type (it’s a little uncommon but may have some benefits), you could design a score function that works for different feature lengths. For example, you could modify the ScorePredictor to include a weight matrix and instead of calculating h_u^Th_v as in the guide you read, it calculates h_u^TWh_v. The weight matrix acts like a linear projection that turns a node feature vector to another length, quite similar to the first idea again.

tanvirbKash · January 12, 2022, 6:23am

Thanks for the reply, but ‘pos_score’ and ‘neg_score’ according to the link returns dictionary values for all specific edges. Then should I just sum all the loss for different edges? or should there be another way to do it?

minjie · January 12, 2022, 8:26am

Summing them should be fine. Just be aware that your graph may have unbalanced type distribution so you may want to scale the loss with proper weights.

system · February 11, 2022, 10:58am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.