Has anyone reproduced the results of this experiment?

slacklife · April 28, 2021, 6:53am

Has anyone reproduced the results of this experiment?
I use the same dataset and the same code with this
It is reported the validate accuracy is 0.701. But I got it that validate accuracy 0.48.
Is there anything I should do with the original code?

Initializing data loader... Loading model... 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 8685/8685 [20:21<00:00, 7.11it/s] 0%| | 0/9177 [00:00<?, ?it/s]Validation accuracy: 0.4814500284276965 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████| 9177/9177 [18:57<00:00, 8.07it/s]

@rusty1s suggested to check the full_feature.npy file and gave a new link.
I donwload another full_feature.npy file and check their md5 and find them the same.
The md5 of two full_feature.npy are all d98ffac92986de2fdaabc3fe44ced36c

Is there any other problem?

slacklife · April 28, 2021, 6:55am

I tried both train.py and train_multi_gpus.py in [dgl/examples/pytorch/ogb_lsc/MAG240M at master · dmlc/dgl · GitHub] and got almost the same result 0.48.

BarclayII · April 28, 2021, 8:07am

What was your PyTorch and CUDA version? I used DGL 0.6.1 with PyTorch 1.8.1 + CUDA 10.2 and I got validation accuracy 0.634 at the first epoch.

Also, did you make any changes to the code? And how did you get the graph.dgl and full.npy files (by downloading or by running the preprocessing script by yourself)?

slacklife · April 28, 2021, 8:21am

Hi BarclayII,
My version is:
dgl-cuda10.2 0.6.1
pytorch 1.8.1
cudatoolkit 10.2.89

I check the train.py and utils.py, and I didn’t change any code.
I get the graph.dgl and full.npy by downloading.

My first epoch’s accuracy is 0.11, logs as follow
Loading graph Loading features Loading masks and labels Initializing dataloader... Initializing model... 100%|█████████████████████████████████████████████████████████████████████████████| 1087/1087 [18:12<00:00, 1.01s/it, loss=3.8817, acc=0.1372] 100%|████████████████████████████████████████████████████████████████████████████████████████████████████████| 136/136 [02:16<00:00, 1.01s/it] 0%| | 0/1087 [00:00<?, ?it/s] Validation accuracy: 0.11353806072731722 Updating best model...

BarclayII · April 28, 2021, 8:40am

This is very strange. My environment and files are basically the same as yours yet we get completely different results.

Would you mind creating another Python environment (either with conda or virtualenv) and try again?

My GPUs are 8 T4s and I run train_multi_gpus.py FYI.

slacklife · April 28, 2021, 8:42am

I tried in two machines with conda environment.
One for train_multi_gpus.py and the other for train.py.
Both the results are 0.4+

What is your files md5?
Mine is
d98ffac92986de2fdaabc3fe44ced36c full_feat.npy 1806459f6acc03eeacf6a89e65a000c3 graph.dgl

BarclayII · April 28, 2021, 9:10am

I see. You should use full.npy instead of full_feat.npy to run DGL’s baseline as the order of node features are different from OGB’s. From README of full_feat.npy:

Note that the features are concatenated in the order of paper, author, and institution, unlike the one in our baseline code.

The MD5 of full.npy is 08f79c973740f1f043e0900f5d8543ab.

slacklife · April 28, 2021, 1:21pm

many Thanks !
I will try to again with full.npy

system · May 28, 2021, 1:21pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.