GraphSage Model Quality Issue?

HuangLED · July 27, 2022, 11:15pm

Hi, Folks,

I have been looking into DGL’s graphsage model quality, the pre-built one came in the example folder (dgl/node_classification.py at master · dmlc/dgl · GitHub). I used ogbn-arxiv dataset, when using the default hyper-parameter came with DGL’s implementation, graphsage gets an accuracy of 0.55 on test set. On the contrary, ogb’s leaderboard shows a accuracy of 0.71 for a typical graphsage implementation.

Is this a known issue? I did a bit comparison, but not sure what I missed.

Any suggestions?

Screenshot of ogbn leader board

Output I got from running dgl’s node_classification.py implementation.

Rhett-Ying · July 28, 2022, 12:14pm

It’s mainly for tutorial. I think you need to tune on your own. or use the same parameters as leader board.

HuangLED · July 28, 2022, 4:36pm

Sure. I did make the hyper-parameters to roughly match. Will keep tuning if needed.

Also I checked a few other data sets, and did similar comparisons. Arvix seems to be an anomaly, that is why I am reporting this finding.

Ogbn-products: DGL’s Accuracy: 0.779, Leaderboard: 0.7829. (Similar)
reddit: DGL’s Accuracy: 0.96, Leaderboard: 0.94. (Similar)
arvix: DGL: 0.55 versus Leaderboard 0.71 (big discrepancy here)

For all the data sets, I used dgl.data interface. Given the comparison above, I tend to believe maybe arvix’s pre-processing does something different (?), and this unique thing is causing a bug somewhere ONLY in arvix. Traced a bit into impl, but couldn’t find anything suspicious so far by myself.

Rhett-Ying · July 29, 2022, 1:17am

you mean you don’t need to tune for other datasets to obtain comparable accuracy? just using the default ones is sufficient?

HuangLED · July 29, 2022, 1:32am

yes, that’s correct.

mufeili · August 1, 2022, 6:07am

Have you added reverse edges for the dataset?

HuangLED · August 1, 2022, 5:22pm

No. I didn’t.

Don’t have enough context to understand the suggestion here. Are you suggesting for arvix, we should always add reverse edge before using the dataset? (if so, does it make sense we add the edge during pre-processing step?)

mufeili · August 2, 2022, 2:22am

Yes, I wonder if this is because the dataset only has edges for one direction and you need to add the reverse edges in the pre-processing step like here.

HuangLED · August 4, 2022, 5:38am

You are correct. I added reverse edges, now the accuracy readings improved significantly.