How to customize NeighborSampler

Hi,

Thanks for sharing this great library. I want to know more about how (or whether I can) customize the Graph samplers.

In my application, I need to sample from a giant tree-structured graph. Specifically, given a query node in this tree, I want to sample nodes from a pool of nodes that are n-hops away from the query node but are not the descendants of this query node. For example, if n=2, then I want to sample from this query node’s parent, siblings, and grand-parents, but not its children or grand-children. I check the above NeighborSampler which requires the neighborhood_type arg to be one of “in”, “out”, or “both”. This does not exactly match my application scenario, because in a graph where parent node points to children node, the relation from query node to its parent node is “in” and the relation from query node’s parent to its own children nodes are “out”.

Currently, I am doing network transformation on networkx graphs and then converting them into DGLGraphs. I assume the NeighborSampler’s underlying implementation will be much more efficient and thus wonder whether/how I can customize it to accommodate my application scenario.

Thanks for your time and help.

Hi,
Currently, the only way to customize a graph sampler is to write C++ code(you may change sampler.cc manually), and build dgl from source. This is not easy, and we are glad to answer your questions if you met any troubles.

@zhengda1936 could you please check is there any ways our current samplers could meet his needs?

Hi @zihao,

Thanks for the updates. I am still in the prototyping/experimenting stage so I guess I don’t need to do C+±level optimization currently. Maybe I can find a way to do it more efficently later. If @zhengda1936 can find some clever ways to use the current samplers, please kindly let me know.

Thanks so much for all of your help.

Hello @mickeystroller,

The current implementation of NeighborSampler is designed for sampling the nodes in the immediate neighborhood, but we are definitely interested in making it more flexable. In your use case, it seems you want to sample nodes from a multi-hop neighborhood. We can extend our sampler to do so. For now, please prototype your sampling strategy in Python.

Best,
Da

Hi @zhengda1936,

I think the current NeighborSampler does support multi-hop if we set the args num_hops>1 right? The tricky thing here is I need to combine both “in” and “out” in a complicated manner. But I think at the current stage, I can just protopye my sampling strategy in Python.

Thanks for your reply.