Steps in graphsage

What does the steps and blocks in the epoch loop of train_sampling_multi_gpu.py refer to. Does it mean that the input graph is broken into sub-graphs and sent as input, or is the whole graph passed again and again, or it’s something else altogether? I am basically trying to understand the flow of the process of training using graphsage on multi-GPU. It would be awesome if someone could explain me the flow.

for epoch in range(args.num_epochs):
tic = time.time()

    # Loop over the dataloader to sample the computation dependency graph as a list of
    # blocks.
    for step, (input_nodes, seeds, blocks) in enumerate(dataloader):
        if proc_id == 0:
            tic_step = time.time()

        # Load the input features as well as output labels
        batch_inputs, batch_labels = load_subtensor(train_g, train_g.ndata['labels'], seeds, input_nodes, dev_id)
        blocks = [block.int().to(dev_id) for block in blocks]
        # Compute loss and prediction
        batch_pred = model(blocks, batch_inputs)
        loss = loss_fcn(batch_pred, batch_labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

I am pretty new at using DGL, so am struggling a bit.
TIA

The code snippet is more about sampling-based training of GNNs for node classification rather than multi-gpu training. Can you take a look at introduction, chapter 6, user guide and 6.1, user guide and see if they explain the logic here?

1 Like

Can you explain why do I see a csrmmt gpu kernel while running the update_all function?

The combination of copy_u message function + sum reduce function would be dispatched into a SpMM kernel call, which is reflected as csrmmt.

1 Like

OK, so that is although clear, but I see that 3 csrmmt kernels are called one after the other contiguously, with almost 0 tine gap. What could be the reason for 2 more csrmmt calls after the first one?
PS: All the CSRMMTs are in the gspmm op itself.

Could you please elaborate more on the your profiling results (i.e. in text/image format)?
The model has 3 layers in our graphsage example and a cusparse spmm call would be triggered once per layer.

So, there are 2 aggregation steps I see, this is I guess because I chose num_layers argument as 2. In the first aggregation step I see 3 csrmmt while in the second one I see just 1.
Also the num_layers argument specifies the number of hops/depth till which we go for aggregation, right?

I just read another thread from you: GPU Kernels while running graphsage model.
The 3 consecutive kernel calls you mentioned do not come from 3 layers, they are triggered by a single cusparse csrspmm call (cusparse would launch several kernels one by one in a single spmm operation).

1 Like

Ok, that explains a lot. Can you also please tell me why do I see the volta_sgemm_32x32 after every aggregation step? As far as I understand it is used for MLP layers, right? And also why do I see 2 volta_sgemm_32x32 kernels?

Yes they are used for MLP, several sgemm kernels with regular shape (e.g. 32x32) are combined to compute a MLP with irregular shape.