Memory Leakage with HeteroGraphConv

derrickfwang · January 24, 2022, 11:33pm

I am experiencing memory leakage issue with HeteroGraphConv.
The version of dgl is 0.7.1, Pytorch 1.9.1.

How to fix this issue?

Sample code is provided below.

#Initial Embedding
embed = NodeEmbed(num_nodes=num_nodes, embed_size=embed_size, device='cpu') 
embeds = embed()

#Define the nodel
model = RGCN(in_feats=embed_size, hid_feats=h_hidden, out_feats=num_classes, rel_names=g.etypes).to('cpu')

#optimizer
optimizer = torch.optim.Adam(itertools.chain(model.parameters(), embed.parameters()), lr=0.01)

# training
all_accuracy = []
best_test_accuracy = 0    


for e in range(epocs):
    # forward      
    embeds = embed()

    logits= model(g,embeds)[category_to_predict]
    pred = torch.argmax(logits, axis=1)

    pred_train = pred[PP_train_IDs]
    train_accuracy = (pred_train == PP_train_label).sum().item() / len(pred_train)

    pred_test = pred[PP_test_IDs]
    test_accuracy = (pred_test == PP_test_label).sum().item() / len(pred_test)

    if test_accuracy > best_test_accuracy:
        best_test_accuracy = test_accuracy


    # compute loss
    loss = F.cross_entropy(logits[PP_train_IDs], PP_train_label)

    # backward
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

    if e % 2 == 0:
        pred_val = torch.argmax(logits, axis=1)[PP_val_IDs]
        val_accuracy = (pred_val == PP_val_label).sum().item() / len(pred_val)
        
        all_accuracy.append([e, train_accuracy, test_accuracy, val_accuracy])
        
        print('In epoch {}, loss: {}, train_acc: {}, test_acc: {}, val_acc: {}, best_test_acc: {}'.format(e, loss, train_accuracy, test_accuracy, val_accuracy, best_test_accuracy))

BarclayII · February 7, 2022, 2:54am

How did you define NodeEmbed and RGCN? Could you provide a minimal runnable example so we can debug?

system · March 9, 2022, 2:55am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.