Inferior Performance with Neighborhood Sampling

ycjing · August 12, 2021, 2:25am

Hi

Thanks for this great tool! When I tried neighborhood sampling by following the tutorial at Node Classification with Neighborhood Sampling, I found that the performance with neighborhood sampling was inferior to that with full graph processing (acc 83% vs 88%). Part of the code is as follows:

With neighborhood sampling:

    # initialize graph
    cur_best = 0
    dur = []
    for epoch in range(args.n_epochs):
        # print(epoch)
        model.train()

        if epoch >= 3:
            t0 = time.time()
        
        loss = torch.Tensor([0.]).to(device)
        # forward
        for input_nodes, output_nodes, blocks in train_dataloader:
            blocks = [b.to(device) for b in blocks]
            h = blocks[0].srcdata['feat']
            h = model(blocks, h)
            logits = h
            # print(logits)
            loss = loss + loss_fcn(logits, blocks[-1].dstdata['label'])
        # print(loss)
        loss = loss / len(train_dataloader)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

Full graph processing:

    # initialize graph
    dur = []
    for epoch in range(args.n_epochs):
        model.train()
        if epoch >= 3:
            t0 = time.time()
        # forward
        logits = model(features)
        loss = loss_fcn(logits[train_mask], labels[train_mask])

        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

I am curious why this would occur, and how to improve the performance of the model with neighborhood sampling? I will truly appreciate your help. Thank you in advance!

Best,
Yongcheng

BarclayII · August 12, 2021, 2:29am

The performance impact of neighborhood sampling may occur for various reasons, as it might depend on the actual model (like which aggregator you use), how large is your dataset, how was your optimizer set up, etc. In particular, there’s not too much benefit if your graph is small.

One thing I noticed however is that you are summing up the loss computed from every minibatch and doing a full batch gradient descent step instead of one step per minibatch. For large graph training usually we do the latter.

ycjing · August 12, 2021, 2:37am

Hi @BarclayII

Thank you for the so quick response! I truly appreciate it. Yes, currently I use the Cora dataset with the supervised setting. My task is to extract the dependency graph of each node and try to propose an algorithm to do some further processing with the obtained dependency graph.

So it will be hard to compare the proposed algorithm with existing full-graph-processing one if the initial model with neighborhood processing is inferior in performance at the beginning. Could you please give me some hints on the potential reasons of the inferior performance with neighborhood sampling? Thank you so much!

Best,
Yongcheng

BarclayII · August 12, 2021, 3:46am

The number of neighbors you choose and the module you select might matter. For instance, SAGEConv and GATConv usually works well with neighbor sampling, while GraphConv does not due to an edge weight being computed during forward pass. Doing a full batch gradient descent as you have written in your code might also be a reason.

ycjing · August 12, 2021, 7:36am

Hi @BarclayII

I truly appreciate your help. Thank you!

Best,
Yongcheng

system · September 11, 2021, 7:36am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.