Increasing feature dimension

Hi,

in some of the example code in the doc I see this method to go to a higher feature dimension/more features per node:

 self.conv1 = dglnn.GraphConv(in_dim, hidden_dim)
 self.conv2 = dglnn.GraphConv(hidden_dim, hidden_dim)

Here the first graph convolution layer is used to go to a higher feature dimension. However, if my in_dim is 6 and I want to go to for instance a hidden_dim of 64, does this then actually work well? As I understand it now, the aggregation function in GCN’s is often something like sum or average and if I then sum/avg the 6 features of neighbouring nodes and only then apply the neural net part of GCN to go to 64 features, a lot of the information that was encoded in my neighbouring nodes is lost by the sum/avg operation before any learning is applied, right?

Wouldn’t it then be better to, in some way, blow-up the 6 features to 64 features per node and only then apply the GCN?

Kind regards,

Erik

  1. A GraphConv layer uses a linear layer to transform node features. When using linear reduce functions like sum or mean, mathematically the order of feature transformation and aggregation does not matter, i.e. (AX)W = A(XW).
  2. Most graph neural networks assume that nodes that are close should have similar features or labels, which is why sum/avg will not cause much information loss with respect to the downstream tasks.
  3. You can also add a residual layer/skip connection to preserve the original node features.

Thank you for the quick reply and explanations! Really helps :slight_smile:

  1. A GraphConv layer uses a linear layer to transform node features. When using linear reduce functions like sum or mean , mathematically the order of feature transformation and aggregation does not matter

Does this also apply for the first GCN layer/step? As then we first aggregate the 6 initial features per node - to which no learning is applied, so we can also not improve on this step except for what we do with the aggregated features later - of all neighbouring nodes and only then apply the linear layer to transform the features. This is how I see it, please correct me if this is wrong:


Then it would matter if we first go to 64 features per node with some NN before aggregating vs. aggregating 6 features per node and then applying a NN to go from 6 to 64 features, right?

  1. Most graph neural networks assume that nodes that are close should have similar features or labels, which is why sum/avg will not cause much information loss with respect to the downstream tasks.

I think for me this assumption will not hold. I am applying GNNs for RL and I basically have a graph for each observation, where my observations are gridworld observations and each node is a grid cell. So features can be very different for adjacent nodes. Would a residual layer/skip connection still be the solution here, or would you then still suggest another approach?

Kind regards,
Erik

Then it would matter if we first go to 64 features per node with some NN before aggregating vs. aggregating 6 features per node and then applying a NN to go from 6 to 64 features, right?

It depends on whether the neural network is nonlinear. If this is a single linear layer, then no. If this consists of a nonlinear layer, then yes.

I think for me this assumption will not hold. I am applying GNNs for RL and I basically have a graph for each observation, where my observations are gridworld observations and each node is a grid cell. So features can be very different for adjacent nodes. Would a residual layer/skip connection still be the solution here, or would you then still suggest another approach?

I’m not very familiar with GNN + RL, so I cannot help much here. If the graphs are grids, have you tried ConvNets?

Thank you again! I will also look into GIN, I think this will help here as well.

I’m not very familiar with GNN + RL, so I cannot help much here. If the graphs are grids, have you tried ConvNets?

Yep, part of what I am doing is researching the difference between the two :slight_smile: Later on I will also go to fully connected graphs with attention. Indeed for the grid version probably CNN will work better!

Kind regards,

Erik

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.