Loss explode training deep graph infomax

vino5211 · February 11, 2022, 5:51pm

have any one faced this problem? training dgi use the model provided in dgl/emamples

Rhett-Ying · February 14, 2022, 2:32am

I tried to run the commands in README: dgl/examples/pytorch/dgi at master · dmlc/dgl · GitHub and all works as expected. I built the DGL with latest master branch, ubuntu.

which DGL version/platform are you using? and what exact command are you using for running example?

vino5211 · February 14, 2022, 2:02pm

I use the model provided in dgl/examples/torch and one model provided by pyG, the loss explodes after many epochs

is there any thing wrong with my graph construction?

Rhett-Ying · February 15, 2022, 1:04am

graph construction looks good to me.

as you used multiple models in your train, there may be something unexpected in you model train. have you tried early stopping, tune learning rate? why do you run so may epochs? validation loss is still decreasing?

vino5211 · February 15, 2022, 2:40am

I first used the code provided by the pyg official pytorch_geometric/infomax_inductive.py at master · pyg-team/pytorch_geometric · GitHub, and the loss explosion occurred in the first few epochs. After that, I used the dgi code provided by dgl, firstly I changed the original code to minibatch training, but the loss explosion still occurred. then I use the original model
I implemented early stoping, in order to see if there will be loss explode, I set the number of epochs to be very large, set the patience value of earlystop to 100, at this 125730 node, 195448566 edge graph, the loss is still declining for so many rounds of training.
and this is the training loss, not the validation loss.
The key problem is that the loss will suddenly increase, and then it will reach the earlystop threshold.

vino5211 · February 15, 2022, 8:23am

hi @Rhett-Ying I have another question, if my node feature is randomly initialized, should I add the features to the optimizer? thanks.

VoVAllen · February 15, 2022, 8:44am

You can set requires_grad to the tensor, such as node_features.requires_grad_()
and do optim = SGD(list(model.parameters())+[node_features]) to include it in the optimizer.

For the loss explosion problem, I would suggest try GAT and GraphSAGE model instead of GCN. Since there’s sampling included in the training process, GCN might not be stable in such cases

vino5211 · February 15, 2022, 10:37am

thank you for reply, yes, sampling might be the cause of loss explosion, so I train the network on the whole graph without minibatch, but loss explosion still occur after few hundreds epoches.

vino5211 · February 15, 2022, 10:40am

as for GAT, I dont know how to implement attention weights and predefined edge weight together, can you help me, thanks.

VoVAllen · February 15, 2022, 10:53am

You don’t need predefined edge weight. It’s computed based on the node features

vino5211 · February 15, 2022, 11:01am

hi @VoVAllen , I want to learn paper embedding through a citation coulping graph, so I need to add coupling strength as edge weight.

If paper A cites paper [C,D,E] and paper B cites paper [D,E,F,G], then the coupling strength between AB is 2, I normalized the coupling strength by devide total references number of each paper.

vino5211 · February 15, 2022, 11:13am

hi there, another question: if I want to get node embedding after training, should I call the forward(g) to calculate the representation or use the node_features metrix directly?

VoVAllen · February 16, 2022, 6:11am

Either way works. Common approaches uses node embeddings which is node_features. I’ve also seen people using the last GCN layer output as the embedding for downstream tasks.

VoVAllen · February 16, 2022, 6:12am

You can add the strength you defined as part of the loss

vino5211 · March 14, 2022, 9:24am

Hi @VoVAllen , can you describe in detail how to add the strength into loss? I am currently using the deep graph infomax framework for unsupervised training。

VoVAllen · March 21, 2022, 6:17am

How did you calculate this coupling strength? Can it be backpropagated?

vino5211 · March 21, 2022, 2:20pm

no , I predefined when building the graph.

system · April 20, 2022, 2:20pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.