Excluding other positive edges when generating negative edges

Hi,
I am building a recommender system using DGL. To train the model, I use EdgeDataLoader, with Uniform negative sampler, using the following code:

sampler = dgl.dataloading.MultiLayerFullNeighborSampler(params['n_layers'])
sampler_n = dgl.dataloading.negative_sampler.Uniform(
        params['neg_sample_size']
    )

edgeloader_train = dgl.dataloading.EdgeDataLoader(
            train_graph,
            train_eids_dict,
            sampler,
            exclude='reverse_types',
            reverse_etypes={'buys': 'bought-by', 'bought-by': 'buys',
                            'clicks': 'clicked-by', 'clicked-by': 'clicks'},
            negative_sampler=sampler_n,
            batch_size=train_params.edge_batch_size,
            shuffle=True,
            drop_last=False,
            pin_memory=True,
            num_workers=num_workers,
        )  

I am using a max-margin loss to train the model, and often use a negative sample size around 1000 edges. I have around 300,000 users, and 10,000 items. When generating negative edges, I would like to make sure that the model does not generate a “negative edge” that is actually a positive edge, i.e. an edge that is not the edge of interest, but that still exists as a real edge in the graph train_graph. In other words, when choosing a “random item” as the “negative item”, I would like to make sure that the user of interest did not actually already interact with the “random item”.

When reading the docs of the Uniform negative sampler (link), I could not find any parameter or clear indication as to whether this was done by default.

For each edge (u, v) of type (srctype, etype, dsttype), DGL generates
k pairs of negative edges (u, v'), where v' is chosen
uniformly from all the nodes of type dsttype.

Thus, does that mean that the chosen v' could be a dst node that is already connected with src node u, and if yes, would you have any suggestion as to how to avoid that?

Thanks a lot in advance!

The negative sampler does not exclude positive edges. The reason is that checking edge existence between two nodes are costly compared to negative sampling, while noise contrastive estimation would (theoretically) yield the same solution no matter if positive examples are excluded. For instance, word2vec does not exclude the cases where a negative-sampled word indeed appeared in the context.

That being said, if you still want to exclude the positive edges, you can instead compute a mask on the negative graph indicating whether the edge exists in the original graph. The original node IDs in the negative graph is

nids = neg_graph.ndata[dgl.NID]

So the source-destination pairs in negative graph are

neg_src, neg_dst = neg_graph.edges()
neg_src = nids[neg_src]
neg_dst = nids[neg_dst]

The mask on the negative edges is simply

neg_mask = g.has_edges_between(neg_src, neg_dst)

When computing loss, you can simply remove the terms with neg_mask 1.

@BarclayII Thanks for your clear answer.

The mask alternative seems greatly adapted to my case, I will try it out.

Just as a follow-up, I would argue that since the number of items (i.e. dst node) is fairly limited (~10,000), excluding positive edges might have a greater impact on the result than for NLP techniques such as word2vec.

Thanks again!

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.