Reproducibility on DataLoader

Vincent · May 22, 2024, 5:00am

Hi!
I’ve recently used dgl.dataloading.dataloader in my training and I found that I can’t reproduce the result.
Every time I run next(iter(dataloader)) it will sample different nodes and graphs.
I’ve already tried the method in Reproducibility, DataLoader: shuffle=True, using seeds but it seems doesn’t work.
I would like to know if there’s any solution to this, thanks!

aure_bnp · May 22, 2024, 7:42am

Hi @Vincent,

I might know why you have this issue. For me, the previous solution still works

fix_seed(10)
# first snippet of code 
sampler = dgl.dataloading.MultiLayerNeighborSampler([15,10])
dataloader = dgl.dataloading.DataLoader(
    graph = g, # the graph
    indices = g.nodes(), # The node IDs to iterate over in minibatches
    graph_sampler = sampler, # the neighbor sampler -> how we will sample train_mask neighborhood
    batch_size = 256, # size of the batch
    shuffle = True, # wether to shuffle or not at each batch
    drop_last = False, # wether to keep or to drop the last incomplete batch
)
# second snippet of code
for batch in tqdm.tqdm(dataloader):
    input_nodes,output_nodes, block = batch
    print(input_nodes,output_nodes, block)
    break

with

def fix_seed(seed):
    '''
    Args : 
        seed : fix the seed
    Function which allows to fix all the seed and get reproducible results
    '''
    torch.manual_seed(seed)
    np.random.seed(seed)
    random.seed(seed)
    os.environ['PYTHONHASHSEED'] = str(seed)
    dgl.seed(seed)
    dgl.random.seed(seed)
    torch.cuda.manual_seed(seed)
    torch.cuda.manual_seed_all(seed)
    torch.backends.cudnn.benchmark = False
    torch.use_deterministic_algorithms(True)
    os.environ['OMP_NUM_THREADS'] = '1'
    os.environ['MKL_NUM_THREADS'] = '1'
    torch.set_num_threads(1)

If you put the first snippet of code in the same cell as the second, you will only need once of fix_seed, but if you split the first snippet and the second snippet of code in different cells, then you will need to put fix_seed in both cells.

Be careful, if you run twice the second snippet of code without running again the first one, you will not see the same nodes appears, it is because you are still browsing your dataloader. In order to always have the same nodes, you’ll need to reinitialize your dataloader.

I hope it helps, tell me if ti works.

(I am not from dgl support, I was the author of the topic you mentionned. I am just trying to help here )

Vincent · May 22, 2024, 8:28am

Hi @aure_bnp ,

You save my day! Previously I only use fix_seed() function when generating dataloader. I run next(iter(dataloader)) in another cell so no wonder it didn’t work!
Now I have a new question which is I need to put fix_seed() in every single cell to make it fix after generating the dataloader right?
Just like the training part code below:

print('Start training...')
for epoch in tqdm(range(EPOCH)):
    start_time = time.time()
    print('Training loss:')
    model.train()
    total_loss = 0
    fix_seed(seed)
    for _, pos_g, neg_g, blocks in train_dataloader:

Another question is that if I use .py rather than .ipynb to run my code, do I only need to use fix_seed() in main.py or do I need to use it in every other file?

Again I really appreciate your help, thanks a lot and I look forward to your reply!!!

aure_bnp · May 22, 2024, 8:58am

Great ! Happy to help
For the first question, I think if you do :

print('Start training...')
fix_seed(seed)
for epoch in tqdm(range(EPOCH)):
    start_time = time.time()
    print('Training loss:')
    model.train()
    total_loss = 0
    for _, pos_g, neg_g, blocks in train_dataloader:

It should work, but you can check this really fast, i’ll let you do this.

Regarding the second question, I don’t know as I never tried with.py files. But putting it in all files .py will definetly make it works. You should try both, and if it works with the fix_seed only in the main.py, it’s great, otherwise, put fix_seed in all the files that browse a dataloader and create one.

frozenbugs · May 23, 2024, 2:08am

Thanks for the response. Other than the solution above, please also try our brand new dataloader graphbolt, tutorial here: 🆕 Stochastic Training of GNNs with GraphBolt — DGL 2.2.1 documentation, it is faster and flexible to use.

Vincent · May 23, 2024, 9:03am

Looks great! I’ll try this out. Thanks for the information!

system · June 22, 2024, 9:04am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.