Hello,
I have a simple model with three GraphConv
and one Linear
layer, trying to do some predictions over a graph with 700 vertices (560 training vertices in a semi-supervised GCN setting). About 10 days ago, when I first wrote it, I was consistently getting high training/test accuracies of >= 95%. Something changed one of the days and I can’t figure out what (it is such an extremely simple model) that made it behave in a way where the loss would either go down painstakingly slowly or start oscillating. This meant that the accuracy numbers dropped below 40%. But this wasn’t consistent either. Once every 10-15 runs, the accuracy would shoot up to > 85% and then fall back down. All of this was on the exact same training set (same training vertices).
This made me suspect that perhaps something was up with the random state. So I set torch.manual_seed(0)
at the start of the program. Now I have accuracy at 95%. Also, as is expected with a fixed RNG seed, the accuracy is consistent across runs. So, surely my hunch about this being related to RNG was right, but my fix feels like a band-aid on the symptom and not a real root cause analysis. As to the question of why I am starting at such a terrible state is simply beyond me. I also highly doubt that my simple use-case has uncovered an underlying PyTorch/DGL bug. So something is up with my model.
Does anyone have an obvious hunch, say without looking at the actual code itself? If you need to see the model itself, then I can post it here, but really it is as simples as:
nn.ReLU()(dgl.nn.pytorch.GraphConv(10, 10)) x 3
nn.Sigmoid()(nn.Linear(10, 4))
I originally posted this question in the PyTorch forum: