How to get better Test Accuracy?

paulfryer · April 2, 2021, 8:44pm

I’ve got a graph that has 18 edges, and 15 nodes. I’m training it with Sagemaker, using a ml.m5.2xlarge instance. It’s taking over an hour. I’m getting close to 100% “Training Accuracy” within 10 or so epochs. The training objective I’m using is “Test Accuracy”, which is low, like 0.16.

Train Accuracy: 0.9850 | Train Loss: 0.0430
Epoch 00012:00026 | Train Forward Time(s) 0.0162 | Backward Time(s) 0.1528
Validation Accuracy: 0.1603 | Validation loss: 9.4638

What kinds of things should I be doing to address this? Seems like its overfitting the data. I’m using a HPO job on Sagemaker, so any specific guidance with that would be great. Thanks.

paulfryer · April 2, 2021, 8:50pm

Here is one of the jobs hyper parameters:

mufeili · April 3, 2021, 8:35am

Did you use NeptuneML or pure SageMaker? Can you try selecting the hyperparameters and performing early stopping based on the validation accuracy?

paulfryer · April 3, 2021, 11:34pm

I used the format Neptune ML uses. I actually did an export from a relational database to the format Neptune ML uses, then submitted the pre-processing and auto-trainer jobs the same way Neptune ML does.

classicsong · April 6, 2021, 2:39am

Did you export the data from RDB and load it into Neptune and launch the Neptune ML pipeline?
BTW, can we provide more details about your data? Do you have node properties (features)?
What task are you targeting? node classification or link prediction?
And 18 edges and 15 nodes is a tiny graph.

paulfryer · April 6, 2021, 2:48pm

I exported data from RDB to S3 in CSV files, individual files for nodes and edges - just like the Neptune export process does. I then ran the preprocessing jobs and then the auto-traininer job, just like the Neptune ML pipeline does.

The 18 and 15 are the number of “types” of nodes and edges. The actual node and edge count are:
num_nodes=3123700, num_edges=13187452

The target is a classification. Attached is a screenshot that shows the number of classes and some of the training parameters.

classicsong · April 6, 2021, 4:14pm

Your validation accuracy is also very low. It seems there is an overfit.
Can you show me your configuration file about the preprocessing? BTW, can you join the DGL’s slack channel, so we can discuss it in slack?