Node feature update?

As far as I observed, the examples on DGL document is all about:
Taking a graph, and update a node feature within a graph through iteration.

However, what I want to do is as below.

I have look-up table (of NLP) and use the embeddings corresponding to the node as features.
However, as there are a lot of sentences, I am not supposing making ‘one’ graph, but a lot of ‘graphs’.

Is it okay to declare a node feature to each graph per every epoch after I get embedding from Look-up table?

I need to update the embedding of the look-up table, but I should re-initialize the node feature per every step. I am using GAT, and I have such a low performance on accuracy that I doubt the justifiability of using DGL library.

I need help!!

Your proposal sounds fine to me and DGL should support that. DGL is a library for graph neural networks (GNNs) and the question here can be more about whether GNNs are useful for your task. Could you provide more details on the experiment, the performance numbers and the baselines you are comparing against?

I am comparing it with vanilla Transformer.
For the fair comparison, I connected all the nodes each other on GAT.

For the token classification, vanilla Transformer achieves about 93%,
but GAT stays on 19%.

As I am updating the feature of every instances per every epoch,
(updating means doing inplace, in my case)
I doubt the problem might happen at there.

Of course input is ordered and and the order is same for both Transformer and GAT.

GAT is still different from vanilla Transformer due to:

  • a different attention computation mechanism
  • a lack of positional encoding

Meanwhile as you mentioned, the feature update may not be done properly. Could you post a minimal runnable example for us to take a deeper look?

This is part of the GAT code.

def forward(self, g):
    feat = g.ndata.pop('x') 

    for l in range(self.num_layers): 
        feat = self.gat_layers[l](g, feat).flatten(1) 
    feat = self.gat_layers[-1](g, feat).mean(1)
    
    g.ndata['logit'] = feat 

    net_output = dgl.unbatch(g)

    return net_output

and input of the forward function is as below:

# Previous code is a code for making a graph object
# with dgl.graph([. ],[. ]).  &.  ndata['x']

graphs = dgl.batch(graphs)

return graphs

graphs for dgl.batch is a Batch of graph objects
that I make every epoch with the feature of updated word embedding.

Thank you mufeili.

Are you also updating the word embeddings? If so, can you check if feat has gradient after loss.backward()?

Yes, I am. And I also identified that gradient was actually flowing. As far as I know, GAT does not use Q, K, V attention, but the way GAT uses as attention is relatively powerful. Is it possible to show much worse performance than Transformer? I doubt my code, but I cannot find where’s wrong.

Do you have any point you doubt for degradation?

Thank you, mufeili.

You can use randomly initialized word embeddings without updating them and see if the performance gap is still huge.

If so, then likely the attention mechanism difference and positional encoding play a critical role.

Oh, that is nice counsel. Thank you mufeili. I’m going to try it!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.