How to get a better model

yang · April 10, 2020, 10:04am

When I use gcn or graphsage model in dgl/example to train on my dataset for node dichotomy semi-supervised training, I find the validation/test set result is not stable and I think it’s because of not enough training . I set a very big training epoch and find the validation/test set result even get worse with the loss down in a small value.
So is this because overfitting? I really confused about the loss and the accuracy is both down. Does it mean my dataset(mask label 0: mask label1 = 1:1 ) not so good ? Or should I change a loss function(now is torch.nn.CrossEntropyLoss())?

mufeili · April 10, 2020, 10:18am

In addition to the training loss and test accuracy, can you also plot curves for training accuracies and validation accuracies? If the training accuracy increases while the validation/test accuracies decrease, then this is likely an overfitting issue and you may need to tune things like early stopping, L2 regularization and dropout.

yang · April 10, 2020, 11:32am

@mufeili Thanks for your reply. I have watched validation accuracies as the test accuracies. Both of them decreased. I have other training experience like a lightgbm model or a tensorflow cnn model, but neither of them have a such unstable performance. So any example for me to make a early stopping with the loss and accuracy is unstable?

mufeili · April 10, 2020, 5:35pm

This example may not be exactly what you want, but does contain a part for early stopping. Basically we check if the validation accuracy no longer improves for a pre-specified number of epochs. If so, early stopping will be performed. For test evaluation, we load model saved which has achieved the highest validation accuracy during training.

yang · April 15, 2020, 9:06am

@mufeili Follow your advice. I use https://github.com/dmlc/dgl/blob/master/examples/pytorch/graphsage/train_sampling_multi_gpu.py and https://github.com/dmlc/dgl/blob/master/examples/pytorch/gcn/train.py to train,the result as follows:

               precision    recall  f1-score   support

 class_0       0.89      0.94      0.91     42498
 class_1       0.32      0.19      0.24      6212

 accuracy                           0.85     48710
 macro avg      0.60      0.56      0.58     48710
weighted avg    0.82      0.85      0.83     48710

As you can see, the performance on class_1 is bad. I think this may caused by the imbanlance node of 0 and 1 on the graph (which 0:1:unknow = 425001 : 62133 : 2109839). Anyway to improve the performance?

mufeili · April 15, 2020, 9:56am

See it the technique described in this article helps: https://towardsdatascience.com/handling-imbalanced-datasets-in-deep-learning-f48407a0e758

qillbel · April 27, 2020, 12:24am

Hey @yang,

How do you generate those details? I need it to evaluate the model

Thanks

qillbel · April 27, 2020, 11:49am

Thanks for the pointer @mufeili,

However, I’m afraid I cannot access that article because I’m not a member.

mufeili · April 29, 2020, 8:08am

Does this one work? https://www.kdnuggets.com/2020/01/5-most-useful-techniques-handle-imbalanced-datasets.html

qillbel · April 29, 2020, 10:44am

Thanks @mufeili. It does work and i have a question about it.
I believe I can do random over/under sampling to my giant graph that I want to train. And I think it will not cause any problem if i do node classification based on the features of nodes.

However, if i do node classification that consider the edge’s feature too and I do over sampling, how should I treat the new edges (it’s features and its connection) after I add more nodes by doing over sampling?

Thanks so much

mufeili · May 1, 2020, 2:07pm

If you are working on node classification, I think you can simply perform down sampling/up sampling by weighting the loss terms for different nodes in loss computation. In this way, you do not need to explicitly modify the graph topology or features.

qillbel · May 6, 2020, 3:06am

thanks @mufeili,

by weighting the loss terms, do you mean modifying class CrossEntropyLoss() or another loss functions I use in training?

Do you have reference or example of doing such a thing?

thanks

mufeili · May 6, 2020, 7:01am

Take a look at the pos_weight argument of PyTorch’s BCEWithLogitsLoss and see if it’s applicable here.