Multi GPU Exact Inference

cluelessdeveloper · February 16, 2023, 1:31am

I was wondering if it was possible to perform multiple gpu exact inference. Similar to this part of the docs: 6.6 Exact Offline Inference on Large Graphs — DGL 1.0.1 documentation except with multiple gpus as opposed to just one.

czkkkkkk · February 16, 2023, 8:59am

Hi @cluelessdeveloper .

A simple solution is to follow the practice of Pytorch DDP. It may work as follows.

Prepare the tensors to be shared by all processes before spawning them, including the input feature x and intermediate embeddings[], i.e., x and y in the layerwise inference. Mark them as shared_memory, e.g., x = x.shared_memory().
Spawn a process for each GPU and pass the shared memory tensors to them.
Change the inference logic.

# l - 1 means the output embeddings of layer l
input = x if l == 0 else embeddings[l-1]
output = embeddings[l]
for it, (input_nodes, output_nodes, blocks) in enumerate(dataloader):
    # Skip batches that are processed by other GPUs
    if it % world_size != rank:
      continue
    block = blocks[0]

    # Copy the features of necessary input nodes to GPU
    h = input[input_nodes].to(device)
    # Compute output.  Note that this computation is the same
    # but only for a single layer.
    h_dst = h[:block.number_of_dst_nodes()]
    h = F.relu(layer(block, (h, h_dst)))
    # Copy to output back to CPU.
    output[output_nodes] = h.cpu()
# Remember to insert a sync barrier to sync all GPUs after a layer.

system · March 18, 2023, 9:00am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.