Hi I got an error when I use dataloader with GPU:
I run minibatch training with my own dataset and I got the following error:
---------------------------------------------------------------------------
Empty Traceback (most recent call last)
File ~/miniconda3/lib/python3.10/site-packages/dgl/dataloading/dataloader.py:499, in _PrefetchingIter._next_threaded(self)
498 try:
--> 499 batch, feats, stream_event, exception = self.queue.get(timeout=prefetcher_timeout)
500 except Empty:
File ~/miniconda3/lib/python3.10/queue.py:179, in Queue.get(self, block, timeout)
178 if remaining <= 0.0:
--> 179 raise Empty
180 self.not_empty.wait(remaining)
Empty:
During handling of the above exception, another exception occurred:
RuntimeError Traceback (most recent call last)
Cell In[6], line 3
1 for epoch in range(num_epoch):
2 start=time.time()
----> 3 for input_nodes, positive_graph, negative_graph, blocks in train_dataloader:
4 inner_start=time.time()
5 blocks = [b.to(torch.device(device)) for b in blocks]
File ~/miniconda3/lib/python3.10/site-packages/dgl/dataloading/dataloader.py:512, in _PrefetchingIter.__next__(self)
510 def __next__(self):
511 batch, feats, stream_event = \
--> 512 self._next_non_threaded() if not self.use_thread else self._next_threaded()
513 batch = recursive_apply_pair(batch, feats, _assign_for)
514 if stream_event is not None:
File ~/miniconda3/lib/python3.10/site-packages/dgl/dataloading/dataloader.py:501, in _PrefetchingIter._next_threaded(self)
499 batch, feats, stream_event, exception = self.queue.get(timeout=prefetcher_timeout)
500 except Empty:
--> 501 raise RuntimeError(
502 f'Prefetcher thread timed out at {prefetcher_timeout} seconds.')
503 if batch is None:
504 self.thread.join()
RuntimeError: Prefetcher thread timed out at 30 seconds.
Provide some details of my device and graph below:
pytorch version 1.12.1
torch version 1.12.1 + cuda116
Graph details…
Graph(num_nodes={‘item’: 9432, ‘user’: 7120017},
num_edges={(‘item’, ‘clicked-by’, ‘user’): 255122481, (‘user’, ‘click’, ‘item’): 255122481},
metagraph=[(‘item’, ‘user’, ‘clicked-by’), (‘user’, ‘item’, ‘click’)])
negative_sampler = dgl.dataloading.negative_sampler.Uniform(4)
sampler = dgl.dataloading.NeighborSampler([2,2])
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, negative_sampler=negative_sampler)
train_eid_dict = {
etype: g.edges(etype=etype, form=‘eid’)
for etype in g.canonical_etypes}
train_dataloader = dgl.dataloading.DataLoader(
# The following arguments are specific to DataLoader.
g, # The graph
train_eid_dict, # The edges to iterate over
sampler, # The neighbor sampler
device=‘cuda’, # Put the MFGs on CPU or GPU
# The following arguments are inherited from PyTorch DataLoader.
batch_size=4194304, # Batch size
shuffle=True, # Whether to shuffle the nodes for every epoch
drop_last=False, # Whether to drop the last incomplete batch
num_workers=1 # Number of sampler processes
)
Does anyone know how to fix this? Thanks in advance!