What does the steps and blocks in the epoch loop of train_sampling_multi_gpu.py refer to. Does it mean that the input graph is broken into sub-graphs and sent as input, or is the whole graph passed again and again, or it’s something else altogether? I am basically trying to understand the flow of the process of training using graphsage on multi-GPU. It would be awesome if someone could explain me the flow.
for epoch in range(args.num_epochs):
tic = time.time()
# Loop over the dataloader to sample the computation dependency graph as a list of # blocks. for step, (input_nodes, seeds, blocks) in enumerate(dataloader): if proc_id == 0: tic_step = time.time() # Load the input features as well as output labels batch_inputs, batch_labels = load_subtensor(train_g, train_g.ndata['labels'], seeds, input_nodes, dev_id) blocks = [block.int().to(dev_id) for block in blocks] # Compute loss and prediction batch_pred = model(blocks, batch_inputs) loss = loss_fcn(batch_pred, batch_labels) optimizer.zero_grad() loss.backward() optimizer.step()
I am pretty new at using DGL, so am struggling a bit.