Examples of pytorch for train_sampling_unsupervised in graphsage

I have trouble in multi-GPU training of graphsage (dgl-example), since I use a neural network to construct the graph before getting into the mp.process calls. CUDA re-initialize error occurs in blew.

RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the ‘spawn’ start method

After I set the start method to ‘spawn’, another error occurs in blew.

File “/home/xxx/examples/base_train_kmeans.py”, line 568, in attr_graph
File “/home/robot/anaconda3/envs/cycada/lib/python3.6/multiprocessing/process.py”, line 105, in start
self._popen = self._Popen(self)
File “/home/robot/anaconda3/envs/cycada/lib/python3.6/multiprocessing/context.py”, line 284, in _Popen
return Popen(process_obj)
File “/home/robot/anaconda3/envs/cycada/lib/python3.6/multiprocessing/popen_spawn_posix.py”, line 32, in init
super(). init (process_obj)
File “/home/robot/anaconda3/envs/cycada/lib/python3.6/multiprocessing/popen_fork.py”, line 19, in init
File “/home/robot/anaconda3/envs/cycada/lib/python3.6/multiprocessing/popen_spawn_posix.py”, line 47, in _launch
reduction.dump(process_obj, fp)
File “/home/robot/anaconda3/envs/cycada/lib/python3.6/multiprocessing/reduction.py”, line 60, in dump
ForkingPickler(file, protocol).dump(obj)
_pickle.PicklingError: Can’t pickle <function run at 0x7f7970a157b8>: it’s not the same object as main .run

According to this post and the suggestions of @BarclayII, I have taken the following efforts:

  1. convert the GPU tensor to CPU/Numpy array (doesn’t work)

I have the following questions:

  1. Should I consider the CUDA resource that the neural network takes up? Because I met the CUDA re-initialization error when model = model.to(device) model-to-device was executed.
  2. The second suggestion of @BarclayII is to construct the graph in the run function rather than passing the graph as an input variable of run. Will this transformation affect the results of run function in multi-processing mode?
    I follow the example graphsage-unsupervised and change the Reddit dataset into my custom dataset which is constructed by dgl.convert.from_scipy. I initialize the node feature of the graph by graph.ndata['feature']=features .
  3. I am confused about the start method of torch multi-processing. In the torch official post torch official post, they suggest utilizing ‘spawn’ or ‘forkserver’, but in the dgl dgl-graphsage, the default ‘fork’ mode is presented. How to correctly use those start methods and understand the main difference between them? Any suggestions are welcomed.

Thanks in advance!


Do not put graph to gpu in the subprocess(workers). It’s tricky to pass the cuda tensor between process. And you can use spawn method, by adding torch.multiprocessing.set_start_method('spawn')

Generally it’s possible to use fork with cuda, but you need to be careful not doing any cuda operation before forking the subprocess.

Can you show the changes to the GraphSAGE example code? I remember that you created something on GPU before getting into mp.Process, and it’s still probably the case after your fix.