Hi
I have a dataloader for a link prediction task as follows:
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(n_layers)
sampler = dgl.dataloading.as_edge_prediction_sampler(
sampler, exclude='reverse_types',
reverse_etypes={'listened': 'listened-by', 'listened-by': 'listened'},
negative_sampler=dgl.dataloading.negative_sampler.Uniform(10))
.......
def train_dataloader(self):
return dgl.dataloading.DataLoader(
self.train_graph,
self.train_idx,
self.sampler,
batch_size=self.batch_size,
#drop_last=False,
num_workers=0
)
def val_dataloader(self):
return dgl.dataloading.DataLoader(
self.valid_graph,
self.val_idx,
self.sampler,
device=self.device,
batch_size=self.batch_size,
num_workers=0
)
Based on this dataloader, how should the train_graph and valid_graph should look like?
Are the indices I provide in the dataloader, the supervision edges? In that case, should the train_graph have some of the edges already in the graph and the rest of the train edges would be the indices given to the dataloader as supervision edges?
And for validation, I would have all the train edges labelled on the graph but then have the validation indices hidden and do the same?
Right now I have validation loss that is lower than the train loss which is making me think that there is some kind of a leakage. Please let me know if what I said is correct so I can fix my implementation!
Kind regards,
Ece