I was wondering if it was possible to perform multiple gpu exact inference. Similar to this part of the docs: 6.6 Exact Offline Inference on Large Graphs — DGL 1.0.1 documentation except with multiple gpus as opposed to just one.
Hi @cluelessdeveloper .
A simple solution is to follow the practice of Pytorch DDP. It may work as follows.
- Prepare the tensors to be shared by all processes before spawning them, including the input feature
x
and intermediateembeddings[]
, i.e.,x
andy
in the layerwise inference. Mark them as shared_memory, e.g.,x = x.shared_memory()
. - Spawn a process for each GPU and pass the shared memory tensors to them.
- Change the inference logic.
# l - 1 means the output embeddings of layer l
input = x if l == 0 else embeddings[l-1]
output = embeddings[l]
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
# Skip batches that are processed by other GPUs
if it % world_size != rank:
continue
block = blocks[0]
# Copy the features of necessary input nodes to GPU
h = input[input_nodes].to(device)
# Compute output. Note that this computation is the same
# but only for a single layer.
h_dst = h[:block.number_of_dst_nodes()]
h = F.relu(layer(block, (h, h_dst)))
# Copy to output back to CPU.
output[output_nodes] = h.cpu()
# Remember to insert a sync barrier to sync all GPUs after a layer.
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.