V2: Implementing LADIES sampler

Hello there!
I’m trying to implement the LADIES sampler.
Essentially, the way this method works is iteratively sampling the nodes in each message passing layer based on a probability associated with its connectivity to the nodes in the previously selected layer. I’m having trouble doing this using the current DGL library (I have v8.2).

Following the GitHub implementation of the paper’s original authors, the process for selecting the probabilities for one layer should go as follows:

U = lap_matrix[seed_nodes, :]
pi = np.array(np.sum(U.multiply(U), axis=0))[0]
p = pi / np.sum(pi)
s_num = np.min([np.sum(p > 0), layer_samples])
after_nodes = np.random.choice(g.num_nodes(), s_num, p = p, replace = False)

this gives a vector p of probabilities associated with the likelihood, p[i], of a node, i, being selected in this layer. We can then use these probabilities to select nodes for the following layer. Thus, in order to build a frontier (for a single layer), I was following the following procedure:

nodes = torch.unique(torch.concat((torch.tensor(after_nodes), seed_nodes)))
sg = dgl.node_subgraph(g, nodes)

However, manually following this process doesn’t seem to be the most efficient and it doesn’t seem to work as I expect it to because it’s not properly formulated as a block (and throws errors when I try to use dgl.to_block(sg, dst_nodes=seed_nodes, src_nodes=after_nodes).

Intuitively, I think I would like to try to use dgl.sampling.sample_neighbors() but this provides it’s own challenges:

  1. We need to associate probabilities with edges but, since I make the graph undirected via dgl.to_birected() these are doubled and associating a probability with two different eids seems cumbersome and inefficient.
  2. The purpose of this probability is that it can be adaptive. That is, with each new layer, the probabilities are updated via the dst nodes of the previous layer. It seems like continuously updating all the edge data in a graph can also be inefficient.

Any advice in successfully completing this implementation would be super helpful! Thank you in advance!

I have an old implementation of LADIES sampler in [Example][WIP] Layer-dependent Importance Sampling by BarclayII · Pull Request #2242 · dmlc/dgl · GitHub, which was written for probably DGL 0.4. The first version would be identical to yours, while the second (uses DGL’s message passing) and the third (uses scipy) doesn’t have to update the probabilities of all the edges. You might need to change a few places to make it work for the latest DGL version, and I haven’t got the time to change it myself yet.

Thank you! I will check it out and follow up if I successfully get it working!