How does DGL reconstruct graph after sampling stage?

lwwlwwl · February 21, 2024, 3:58pm

Hi, I am trying to understand how DGL reconstructs the subgraph after sampling the neighborhood of target nodes. Let’s say we have target nodes = [0,1,2,3]. If we sample 3 neighbors for each target node, we have the following:

target node id: sampled nbrs
0: 2, 4, 5
1: 3, 6
2: 6, 4, 3
3: 0

Then, NeighborSampler.sample_blocks return seed_nodes, output_nodes, blocks, where seed_nodes is the unique list of sampled neighbors ([0,2,3,4,5,6] in this case), output_nodes is the original target nodes ([0,1,2,3]) and blocks should include the metadata ([Block(num_src_nodes=6, num_dst_nodes=4, num_edges=9)] in this case). Given these three outputs, I’m wondering how DGL is able to distinguish the neighbors from each target node. For example, 4 is the neighbor for both 0 and 2 but where is this information stored? I see in the training script the entire edges list is read into graph. Does that mean going to memory to fetch neighborhood information whenever the neighborhood information is needed?
Thank you in advance!

Rhett-Ying · February 22, 2024, 1:00am

A sampled subgraph containing only the sampled neighboring edges. See more details in https://github.com/dmlc/dgl/blob/364cb7186e94630eb7dc30cd2f494feee1218f8a/python/dgl/sampling/neighbor.py#L210

Rhett-Ying · February 22, 2024, 2:47am

Since DGL2.0, GraphBolt is introduced. please refer to graphbolt code path. here’s the entry for neighbor sampling: https://github.com/dmlc/dgl/blob/00f33224038924c40229bb9c6f8dbe6d0b083960/python/dgl/graphbolt/impl/fused_csc_sampling_graph.py#L549

system · March 23, 2024, 2:47am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.