Overlap data loading and computation with DistDGL

facega1e · March 18, 2025, 6:48am

I want to overlap data load and computation in DistDGL. I use ThreadPoolExecutor to get next batch when current batch training. My code likes below:

train_dataloader = dgl.dataloading.DistNodeDataLoader( g,
        train_nids,
        sampler,
        batch_size=args.batch_size,
        shuffle=True,
        drop_last=False,
    )
executor = ThreadPoolExecutor()
...
dataload_iter = train_dataloader.__iter__()
while True:
            if batch_idx == 0:
                get_first_minibatch(dataload_iter)
            else:
                next_batch_nodes.get()

            future = executor.submit(get_next_minibatch, dataload_iter)  #put batch in queue
            # model train
            future.result()

But codes above have identical execute time with normal version which likes below.

for batch_idx, (input_nodes, output_nodes,
                        blocks) in enumerate(train_dataloader):
            # model train

It really confuses me. I guess that the overlap can’t happen due to the existence of the GIL. Is there any chance to overlap these two stages? Will multiprocessing be useful? Any advice would help me a lot. Thanks in advance.

system · April 17, 2025, 6:48am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.