How to do validation in link prediction for heterogeneous graph

I want to make a link prediction model on heterogeneous graph, and have made some data files:

  • graph.txt: “src_node edge_type dst_node”, used for building the graph graph
  • train.txt: “src_node edge_type dst_node label”, used as train_sample_data, is a subset of graph.txt, “label” always equals to 1
  • valid.txt: “src_node edge_type dst_node label”, used as valid_sample_data, has 0/1 as label, edges(src_node edge_type dst_node) with label 1 are not included in graph.txt

Then I finish the dataloader in training process with EdgeDataLoader like:

    train_eids = generate_edges(graph, vocab, train_sample_data, num_workers) #return eids in train_sample_data
    train_dataloader = dgl.dataloading.EdgeDataLoader(
        graph, train_eids, neighbor_sampler,
        negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
        batch_size=args.batch_size,
        shuffle=True,
        drop_last=False,
        pin_memory=True,
        num_workers=args.workers
    )

My question is how to do validation in this situation? Because the edges in valid_sample_data are not in graph?

EdgeDataLoader has a g_sampling argument that lets you specify the graph to sample neighbors from. So you can make another graph graph_valid that contains both the training edges and the validation edges. Then you can do something like:

train_dataloader = dgl.dataloading.EdgeDataLoader(
        graph_valid, valid_eids, neighbor_sampler,
        g_sampling=graph,
        ...
)

That works! Thanks a lot!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.