Increasing feature dimension

Evester · March 11, 2021, 10:46pm

Hi,

in some of the example code in the doc I see this method to go to a higher feature dimension/more features per node:

 self.conv1 = dglnn.GraphConv(in_dim, hidden_dim)
 self.conv2 = dglnn.GraphConv(hidden_dim, hidden_dim)

Here the first graph convolution layer is used to go to a higher feature dimension. However, if my in_dim is 6 and I want to go to for instance a hidden_dim of 64, does this then actually work well? As I understand it now, the aggregation function in GCN’s is often something like sum or average and if I then sum/avg the 6 features of neighbouring nodes and only then apply the neural net part of GCN to go to 64 features, a lot of the information that was encoded in my neighbouring nodes is lost by the sum/avg operation before any learning is applied, right?

Wouldn’t it then be better to, in some way, blow-up the 6 features to 64 features per node and only then apply the GCN?

Kind regards,

Erik

mufeili · March 12, 2021, 6:25am

A GraphConv layer uses a linear layer to transform node features. When using linear reduce functions like sum or mean, mathematically the order of feature transformation and aggregation does not matter, i.e. (AX)W = A(XW).
Most graph neural networks assume that nodes that are close should have similar features or labels, which is why sum/avg will not cause much information loss with respect to the downstream tasks.
You can also add a residual layer/skip connection to preserve the original node features.

Evester · March 12, 2021, 8:29am

Thank you for the quick reply and explanations! Really helps

A GraphConv layer uses a linear layer to transform node features. When using linear reduce functions like sum or mean , mathematically the order of feature transformation and aggregation does not matter

Does this also apply for the first GCN layer/step? As then we first aggregate the 6 initial features per node - to which no learning is applied, so we can also not improve on this step except for what we do with the aggregated features later - of all neighbouring nodes and only then apply the linear layer to transform the features. This is how I see it, please correct me if this is wrong:

Then it would matter if we first go to 64 features per node with some NN before aggregating vs. aggregating 6 features per node and then applying a NN to go from 6 to 64 features, right?

Most graph neural networks assume that nodes that are close should have similar features or labels, which is why sum/avg will not cause much information loss with respect to the downstream tasks.

I think for me this assumption will not hold. I am applying GNNs for RL and I basically have a graph for each observation, where my observations are gridworld observations and each node is a grid cell. So features can be very different for adjacent nodes. Would a residual layer/skip connection still be the solution here, or would you then still suggest another approach?

Kind regards,
Erik

mufeili · March 13, 2021, 4:42am

Then it would matter if we first go to 64 features per node with some NN before aggregating vs. aggregating 6 features per node and then applying a NN to go from 6 to 64 features, right?

It depends on whether the neural network is nonlinear. If this is a single linear layer, then no. If this consists of a nonlinear layer, then yes.

I think for me this assumption will not hold. I am applying GNNs for RL and I basically have a graph for each observation, where my observations are gridworld observations and each node is a grid cell. So features can be very different for adjacent nodes. Would a residual layer/skip connection still be the solution here, or would you then still suggest another approach?

I’m not very familiar with GNN + RL, so I cannot help much here. If the graphs are grids, have you tried ConvNets?

Evester · March 15, 2021, 2:24pm

Thank you again! I will also look into GIN, I think this will help here as well.

I’m not very familiar with GNN + RL, so I cannot help much here. If the graphs are grids, have you tried ConvNets?

Yep, part of what I am doing is researching the difference between the two Later on I will also go to fully connected graphs with attention. Indeed for the grid version probably CNN will work better!

Kind regards,

Erik

system · April 14, 2021, 2:24pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.