Multi GPU Exact Inference

I was wondering if it was possible to perform multiple gpu exact inference. Similar to this part of the docs: 6.6 Exact Offline Inference on Large Graphs — DGL 1.0.1 documentation except with multiple gpus as opposed to just one.

Hi @cluelessdeveloper .

A simple solution is to follow the practice of Pytorch DDP. It may work as follows.

  1. Prepare the tensors to be shared by all processes before spawning them, including the input feature x and intermediate embeddings[], i.e., x and y in the layerwise inference. Mark them as shared_memory, e.g., x = x.shared_memory().
  2. Spawn a process for each GPU and pass the shared memory tensors to them.
  3. Change the inference logic.
# l - 1 means the output embeddings of layer l
input = x if l == 0 else embeddings[l-1]
output = embeddings[l]
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
    # Skip batches that are processed by other GPUs
    if it % world_size != rank:
      continue
    block = blocks[0]

    # Copy the features of necessary input nodes to GPU
    h = input[input_nodes].to(device)
    # Compute output.  Note that this computation is the same
    # but only for a single layer.
    h_dst = h[:block.number_of_dst_nodes()]
    h = F.relu(layer(block, (h, h_dst)))
    # Copy to output back to CPU.
    output[output_nodes] = h.cpu()
# Remember to insert a sync barrier to sync all GPUs after a layer.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.