I want to make a link prediction model on heterogeneous graph, and have made some data files:
- graph.txt: “src_node edge_type dst_node”, used for building the graph
graph
- train.txt: “src_node edge_type dst_node label”, used as
train_sample_data
, is a subset of graph.txt, “label” always equals to 1 - valid.txt: “src_node edge_type dst_node label”, used as
valid_sample_data
, has 0/1 as label, edges(src_node edge_type dst_node) with label 1 are not included in graph.txt
Then I finish the dataloader in training process with EdgeDataLoader like:
train_eids = generate_edges(graph, vocab, train_sample_data, num_workers) #return eids in train_sample_data
train_dataloader = dgl.dataloading.EdgeDataLoader(
graph, train_eids, neighbor_sampler,
negative_sampler=dgl.dataloading.negative_sampler.Uniform(5),
batch_size=args.batch_size,
shuffle=True,
drop_last=False,
pin_memory=True,
num_workers=args.workers
)
My question is how to do validation in this situation? Because the edges in valid_sample_data
are not in graph
?