Hard Negative Sampling/Mining


I’d like to do hard negative mining as follows -
Given a positive edge (u, v1), the negative edge will be (u, v2) such that v2 is a 2-hop neighbor of u(but not a direct neighbor of u).

I can generate and save all the 2-hop neighbors of each node offline and just load them in a custom negative sampler. But as my graph is quite large(2M nodes, 80M undirected edges), this generation will take a lot of time(a few weeks) and is unacceptable.

Is there a better/faster option? Is there a way in dgl to get a random 2-hop neighbor of a node?


You can use either use random walk to generate path starting from u and then extract your negative nodes or you can use neighbor sampling using u as the seed node to collect all its 2-hop neighbors.

But you need to mask the 1-hop neighbors manually. You can use has_edges_between to do that.

A random walk of length 2 starting from node u may end up at a 1-hop neighbor of u or u itself.

One option(as you said above) is to check whether the final node after the random walk is connected to u. But in this case I’ll have to random walk repeatedly till I get a valid negative node.

I need to always random walk “outwards” from u. I get that this isn’t a traditional random walk, but is this possible with dgl?

I feel node2vec random walk seems to fit what you are looking for. We recently added that into DGL (dgl.sampling.node2vec_random_walk — DGL 0.7 documentation). To prefer random walk “outwards”, you can set p to a large value and q to a small value.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.