Train and test graphs through EdgeDataLoader

Good morning.

I’m following this tutotorial: 6.3 Training GNN for Link Prediction with Neighborhood Sampling — DGL 0.7.2 documentation

I have train and test splits, but not sure how to use during the training loop.

I guess I should use the training graph like this:

dataloader = dgl.dataloading.EdgeDataLoader(
    train_g, train_eid_dict, sampler,
    negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
    batch_size=1024,
    shuffle=True,
    drop_last=False,
    num_workers=4)

However, I’m not sure of how to use the test one. I would like to use eval AUC to evaluate the model in this graph. Should I create another dataloader for this test graph in order to generate pos and neg evaluations scores during the loop?

Also, since my graphs are heterographs, I would need to adapt this AUC function:

def compute_auc(pos_score, neg_score):
    scores = torch.cat([pos_score, neg_score]).numpy()
    labels = torch.cat(
        [torch.ones(pos_score.shape[0]), torch.zeros(neg_score.shape[0])]).numpy()        
    return roc_auc_score(labels, scores)

Could I concatenate a the pos_score and neg_score for the different etypes in order to have just one tensor of scores, and then create another tensor with 1s and 0s? Or should I evaluate every etype separately?

Thank you so much.

Hi,

We added a new tutorial about how to test on the link prediction task, at Stochastic Training of GNN for Link Prediction — DGL 0.8 documentation.

Could you take a look to see whether this can address your question?

Thank you!! It’s quite useful. In this line:

  valid_acc, test_acc = evaluate(emb, node_labels, train_nids, valid_nids, test_nids)

What should “node_labels” be?

Sorry, but it doesnt directly address my doubt. I still would like to now if I could do something like the following:

train_dataloader = dgl.dataloading.EdgeDataLoader(
    train_g, train_eid_dict, sampler,
    negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
    batch_size=1024,
    shuffle=True,
    drop_last=False,
    num_workers=4)

test_dataloader = dgl.dataloading.EdgeDataLoader(
    test_g, test_eid_dict, sampler,
    negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
    batch_size=1024,
    shuffle=True,
    drop_last=False,
    num_workers=4)

And this the AUC:

def compute_auc(pos_score, neg_score, canonical_etypes):

    all_scores = []

    for etype in canonical_etypes:
      all_scores.append(pos_score[etype])    

    pos_dims = len(all_scores)

    for etype in canonical_etypes:
      all_scores.append(neg_score[etype])

    neg_dims = len(all_scores) - pos_dims

    scores = torch.cat(all_scores).numpy()
    
    labels = torch.cat(
        [torch.ones(pos_dims), torch.zeros(neg_dims)]).numpy()
    
    
    return roc_auc_score(labels, scores)

This would be the loop

for epoch in range(n_epochs):
  for input_nodes, positive_graph, negative_graph, blocks in train_dataloader:
      model.train()
      blocks = [b for b in blocks]
      positive_graph = positive_graph
      negative_graph = negative_graph
      input_features = blocks[0].srcdata['Feats']
      pos_score, neg_score = model(positive_graph, negative_graph, blocks, {'ent': input_features})
      loss = compute_loss(pos_score, neg_score, g.canonical_etypes)
      opt.zero_grad()
      loss.backward()
      opt.step()
      print(loss.item())

for input_nodes, positive_graph, negative_graph, blocks in test_dataloader:
      with torch.nograd():
      model.eval()
      blocks = [b for b in blocks]
      positive_graph = positive_graph
      negative_graph = negative_graph
      input_features = blocks[0].srcdata['Feats']
      pos_score, neg_score = model(positive_graph, negative_graph, blocks, {'ent': input_features})
      test_auc = compute_auc(pos_score, neg_score)

AUC here would be computed by batches. Is that ok? If so, could I use the mean of the AUC from every batch?

You can save the pos_score and neg_score in the loop. And concat them together finally and then calculate the score

1 Like