Hi,
I am building a recommender system using DGL. To train the model, I use EdgeDataLoader, with Uniform negative sampler, using the following code:
sampler = dgl.dataloading.MultiLayerFullNeighborSampler(params['n_layers']) sampler_n = dgl.dataloading.negative_sampler.Uniform( params['neg_sample_size'] ) edgeloader_train = dgl.dataloading.EdgeDataLoader( train_graph, train_eids_dict, sampler, exclude='reverse_types', reverse_etypes={'buys': 'bought-by', 'bought-by': 'buys', 'clicks': 'clicked-by', 'clicked-by': 'clicks'}, negative_sampler=sampler_n, batch_size=train_params.edge_batch_size, shuffle=True, drop_last=False, pin_memory=True, num_workers=num_workers, )
I am using a max-margin loss to train the model, and often use a negative sample size around 1000 edges. I have around 300,000 users, and 10,000 items. When generating negative edges, I would like to make sure that the model does not generate a “negative edge” that is actually a positive edge, i.e. an edge that is not the edge of interest, but that still exists as a real edge in the graph train_graph. In other words, when choosing a “random item” as the “negative item”, I would like to make sure that the user of interest did not actually already interact with the “random item”.
When reading the docs of the Uniform negative sampler (link), I could not find any parameter or clear indication as to whether this was done by default.
For each edge
(u, v)
of type(srctype, etype, dsttype)
, DGL generates
k
pairs of negative edges(u, v')
, wherev'
is chosen
uniformly from all the nodes of typedsttype
.
Thus, does that mean that the chosen v'
could be a dst node that is already connected with src node u
, and if yes, would you have any suggestion as to how to avoid that?
Thanks a lot in advance!