Can Reduce-function return Tensors with different dimension?

In my code, I use update as following:

self.g.update_all(gcn_msg, gcn_reduce, self.node_update)

My reduce function is as following:

def gcn_reduce(node):
    accum = node.mailbox['m']
    return {'h': accum}

,which is different with the original implement:

def gcn_reduce(node):
    accum = torch.sum(node.mailbox['m'], 1) * node.data['norm']
    return {'h': accum}

My code runs failed as following:

I wonder can Reduce-function return Tensors with different dimension?

Besides, can we use self.node_update according to the in_degree of nodes?

  1. The problem is that we store node/edge features in a “table” specified by field name ('h' in your case) and all the features in that table are expected to have the same dimension/size. When you need to deal with tensors of different shape one possible workaround is to store tensors of different shapes in different fields.
  2. You can define a function where some operation is performed based on node degrees and use that user defined function in place of self.node_update. You can access the nodes’ in degrees with nodes._g.in_degrees(nodes.nodes()) when defining the node update function.

Could you share the scenario of why you want to try this?

As @mufeili mentioned above, the problem is the feature shape. Consider a graph of below:

0<-1
1<-2
1<-3

When reducing on node 0 and 1, node#0 gets one message while node#1 gets 2. By your reduce function, the h feature of node#0 is 1xD while for node#1 is 2xD, making it hard to store them compactly.

Does gcn_reduce is called based on node degrees?

Yeah, this is called degree bucketing where at each time we process nodes that have the same number of incoming messages.

@ mufeili
In the code as following:

self.g.update_all(gcn_msg, gcn_reduce, self.node_update)

Can we also use self.node_update as degree bucketing, I think it is more flexible.

I wonder how self.node_update perform based on node degrees?

It could. But because the node update function can be applied to all nodes in parallel, we don’t apply it by degrees. If you want to apply the function by degree, you could merge it with reduce function.

You mentioned that “it is more flexible”. Could you please elaborate more?

I see. Now I think it is the same as merging it with reduce function.