A naive question, what exactly are the parameters optimized by GNN?

user1 · April 11, 2020, 9:44am

For a certain GNN model, we first initialize the node representation, (an initialization embedding layer) and then aggregate the information of neighbor nodes, and finally use the aggregated representation to calculate the loss for certain tasks, and then backpropagate .
Why don’t we use the representation of the embedding layer? Is there a misunderstanding?
What is used for other tasks is also the representation after aggregation, so what is the use of the embedding layer, only the parameters?

Thanks for your help！

mufeili · April 11, 2020, 10:09am

I’m not sure if I fully understand your questions, but below are some possible explanations:

The initial node representations can be some node features rather than the output of embedding layers. For example, they can be bag-of-words representations for citation networks where nodes represent articles.
The message passing operation updates the representation of a node by gathering information from its neighbors. With k rounds of message passing, we essentially gather information from k-hop neighborhoods for nodes. For each round of message passing, we typically have separate parameters to transform the features of neighbors. An initial embedding layer may not be expressive enough since we need some non-linearity and projection between rounds of message passing.

user1 · April 11, 2020, 11:45am

So the initial features representation will not be updated?
What if we don’t have any initial features to use?
Then we have to randomly initialize the nodes. For example, setting an initialization Embedding layer.
Then a node will have another representation after aggregation.

mufeili · April 11, 2020, 11:46am

Exactly.

“Post must be at least 20 characters”

user1 · April 11, 2020, 11:50am

At this time, is it equivalent to using non-existent and random initialization features as parameter to optimize, and then using the aggregated representation for other tasks?

mufeili · April 11, 2020, 12:17pm

Even in this case, we still have separate nonlinearity and learnable weights within each graph neural network layer.

user1 · April 11, 2020, 12:43pm

Yes,so the parameter is The Embedding layer and other learnable weight.
Thank you for your patience.
Thank you!