Hi,
I am building a recommender system, using link prediction on a HeteroGraph with 8 types of relations and 3 types of nodes. To do so, I am using EdgeDataLoader.
I would like to understand better how edges are removed or not from the computation graph. In the docs (link), it is written that “the sampled edges as well as their reverse edges are removed from computation dependencies of the incident nodes. This is a common trick to avoid information leakage.”
Does this mean that with basic parameters, sampled edges are removed from the computation graph? If I were to create this EdgeDataLoader, would it remove the sampled edges?
edgeloader_train = dgl.dataloading.EdgeDataLoader( train_graph, train_eids_dict, sampler, negative_sampler=sampler_n, batch_size=edge_batch_size, shuffle=True, drop_last=False, pin_memory=True, num_workers=num_workers, )
Or do I need to specify the “exclude” argument? And if I specify “exclude” as “reverse_etypes”, do I also need to provide the edge ids?
edgeloader_train = dgl.dataloading.EdgeDataLoader( train_graph, train_eids_dict, sampler, exclude='reverse_types', # If I use this, do I need to specific the edge ids? reverse_etypes={'buys': 'bought-by', 'bought-by': 'buys', 'clicks': 'clicked-by', 'clicked-by': 'clicks'}, negative_sampler=sampler_n, batch_size=train_params.edge_batch_size, shuffle=True, drop_last=False, # Drop last batch if non-full pin_memory=True, # Helps the transfer to GPU num_workers=num_workers, )
Also, If I understood correctly, this means that the edges are removed from the computation (in order to prevent that e.g. the model learns to predict high ratings to pair of nodes that are connected in the graph just because they are connected), but the edges are still in the positive_graph generated by the dataloader.
I would like to make sure that the sampled edges are not in the computation graph; that way, I assume that the model will be better to generalize to unseen data.
Thanks a lot in advance!