I was wondering if it was possible to perform multiple gpu exact inference. Similar to this part of the docs: 6.6 Exact Offline Inference on Large Graphs — DGL 1.0.1 documentation except with multiple gpus as opposed to just one.
Hi @cluelessdeveloper .
A simple solution is to follow the practice of Pytorch DDP. It may work as follows.
- Prepare the tensors to be shared by all processes before spawning them, including the input feature
yin the layerwise inference. Mark them as shared_memory, e.g.,
x = x.shared_memory().
- Spawn a process for each GPU and pass the shared memory tensors to them.
- Change the inference logic.
# l - 1 means the output embeddings of layer l input = x if l == 0 else embeddings[l-1] output = embeddings[l] for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader): # Skip batches that are processed by other GPUs if it % world_size != rank: continue block = blocks # Copy the features of necessary input nodes to GPU h = input[input_nodes].to(device) # Compute output. Note that this computation is the same # but only for a single layer. h_dst = h[:block.number_of_dst_nodes()] h = F.relu(layer(block, (h, h_dst))) # Copy to output back to CPU. output[output_nodes] = h.cpu() # Remember to insert a sync barrier to sync all GPUs after a layer.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.