RuntimeError: DataLoader worker (pid(s) 19716) exited unexpectedly

I keep running into the following error while using dgl with only CPU:

Empty                                     Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _try_get_data(self, timeout)
    871         try:
--> 872             data = self._data_queue.get(timeout=timeout)
    873             return (True, data)

C:\ProgramData\Anaconda3\lib\multiprocessing\queues.py in get(self, block, timeout)
    107                     if not self._poll(timeout):
--> 108                         raise Empty
    109                 elif not self._poll():

Empty: 

The above exception was the direct cause of the following exception:

RuntimeError                              Traceback (most recent call last)
<ipython-input-30-ce301cad9c1c> in <module>
      1 results = []
      2 with torch.no_grad():
----> 3     for x,_ in progress_bar(dl):
      4         x = x.to(torch.device("cpu")) #cuda"))
      5         results.append(m(x))

C:\ProgramData\Anaconda3\lib\site-packages\fastprogress\fastprogress.py in __iter__(self)
     45         except Exception as e:
     46             self.on_interrupt()
---> 47             raise e
     48 
     49     def update(self, val):

C:\ProgramData\Anaconda3\lib\site-packages\fastprogress\fastprogress.py in __iter__(self)
     39         if self.total != 0: self.update(0)
     40         try:
---> 41             for i,o in enumerate(self.gen):
     42                 if i >= self.total: break
     43                 yield o

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in __next__(self)
    433         if self._sampler_iter is None:
    434             self._reset()
--> 435         data = self._next_data()
    436         self._num_yielded += 1
    437         if self._dataset_kind == _DatasetKind.Iterable and \

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _next_data(self)
   1066 
   1067             assert not self._shutdown and self._tasks_outstanding > 0
-> 1068             idx, data = self._get_data()
   1069             self._tasks_outstanding -= 1
   1070             if self._dataset_kind == _DatasetKind.Iterable:

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _get_data(self)
   1032         else:
   1033             while True:
-> 1034                 success, data = self._try_get_data()
   1035                 if success:
   1036                     return data

C:\ProgramData\Anaconda3\lib\site-packages\torch\utils\data\dataloader.py in _try_get_data(self, timeout)
    883             if len(failed_workers) > 0:
    884                 pids_str = ', '.join(str(w.pid) for w in failed_workers)
--> 885                 raise RuntimeError('DataLoader worker (pid(s) {}) exited unexpectedly'.format(pids_str)) from e
    886             if isinstance(e, queue.Empty):
    887                 return (False, None)

RuntimeError: DataLoader worker (pid(s) 19716) exited unexpectedly

This is my code:

ds = InferenceDataset(combinations)
import dgl.dataloading.pytorch 

def my_collate(t):
    xs, ys = zip(*t)
    batched_g = dgl.batch(xs)
    batched_g.shape = (len(xs), 1)
    return batched_g, torch.stack(ys)

m = ClassifierInference(item_embeddings)

m.load_state_dict(torch.load("models/unfreeze-1-epoch-naive.pth"))



dl = dgl.dataloading.pytorch.GraphDataLoader(ds, batch_size=512, num_workers=2)

results = []
with torch.no_grad():
    for x,_ in progress_bar(dl):
        x = x.to(torch.device("cpu")) #ERROR OCCURS IN THIS LINE
        results.append(m(x))

I’m running fastai 2.3.1, torch 1.7.1, dgl 0.6.1

Looks like the same problem as CUDA: device-side assert triggered.

What is InferenceDataset and ClassifierInference? And what are combinations and item_embeddings? I don’t know if DGL is readily compatible with fastai so if you can provide a minimal reproducible example that will be absolutely helpful.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.