What does the steps and blocks in the epoch loop of train_sampling_multi_gpu.py refer to. Does it mean that the input graph is broken into sub-graphs and sent as input, or is the whole graph passed again and again, or it’s something else altogether? I am basically trying to understand the flow of the process of training using graphsage on multi-GPU. It would be awesome if someone could explain me the flow.
for epoch in range(args.num_epochs):
tic = time.time()
# Loop over the dataloader to sample the computation dependency graph as a list of
# blocks.
for step, (input_nodes, seeds, blocks) in enumerate(dataloader):
if proc_id == 0:
tic_step = time.time()
# Load the input features as well as output labels
batch_inputs, batch_labels = load_subtensor(train_g, train_g.ndata['labels'], seeds, input_nodes, dev_id)
blocks = [block.int().to(dev_id) for block in blocks]
# Compute loss and prediction
batch_pred = model(blocks, batch_inputs)
loss = loss_fcn(batch_pred, batch_labels)
optimizer.zero_grad()
loss.backward()
optimizer.step()
I am pretty new at using DGL, so am struggling a bit.
TIA