GraphSage Model Quality Issue?

Hi, Folks,

I have been looking into DGL’s graphsage model quality, the pre-built one came in the example folder (dgl/node_classification.py at master · dmlc/dgl · GitHub). I used ogbn-arxiv dataset, when using the default hyper-parameter came with DGL’s implementation, graphsage gets an accuracy of 0.55 on test set. On the contrary, ogb’s leaderboard shows a accuracy of 0.71 for a typical graphsage implementation.

Is this a known issue? I did a bit comparison, but not sure what I missed.

Any suggestions?

Screenshot of ogbn leader board

Output I got from running dgl’s node_classification.py implementation.

It’s mainly for tutorial. I think you need to tune on your own. or use the same parameters as leader board.

Sure. I did make the hyper-parameters to roughly match. Will keep tuning if needed.

Also I checked a few other data sets, and did similar comparisons. Arvix seems to be an anomaly, that is why I am reporting this finding.

Ogbn-products: DGL’s Accuracy: 0.779, Leaderboard: 0.7829. (Similar)
reddit: DGL’s Accuracy: 0.96, Leaderboard: 0.94. (Similar)
arvix: DGL: 0.55 versus Leaderboard 0.71 (big discrepancy here)

For all the data sets, I used dgl.data interface. Given the comparison above, I tend to believe maybe arvix’s pre-processing does something different (?), and this unique thing is causing a bug somewhere ONLY in arvix. Traced a bit into impl, but couldn’t find anything suspicious so far by myself.

you mean you don’t need to tune for other datasets to obtain comparable accuracy? just using the default ones is sufficient?

yes, that’s correct.

Have you added reverse edges for the dataset?

No. I didn’t.

Don’t have enough context to understand the suggestion here. Are you suggesting for arvix, we should always add reverse edge before using the dataset? (if so, does it make sense we add the edge during pre-processing step?)

Yes, I wonder if this is because the dataset only has edges for one direction and you need to add the reverse edges in the pre-processing step like here.

1 Like

You are correct. I added reverse edges, now the accuracy readings improved significantly.