How to get the neignbor of each node in minibatch or how to handle the block graph

yichuan520030910320 · August 14, 2023, 5:27am

When using DGL, if we use the following method to obtain a minibatch:

pythonCopy code

for it, (input_nodes, output_nodes, blocks) in enumerate(train_dataloader):

blocks[0].dstdata['_ID'] and blocks[0].srcdata['_ID'] both yield results after ‘fusion’ (de-duplication by DGL). How can I know which neighbors each node in blocks[0].srcdata['_ID'] is sampled from, without getting the de-duplicated results? Or is there a way to remove some specific nodes from the bipartite graph blocks[0]? I only want to retain some nodes in blocks[0].dstdata and want to remove other nodes and their corresponding sources. Does anyone have any ideas?

BarclayII · August 14, 2023, 9:36am

You can’t do so outside the DataLoader. One thing you could try is to customize the training node IDs you feed into the DataLoader: instead of

train_dataloader = DataLoader(graph, train_nids, sampler, ...)

How about

train_dataloader = DataLoader(graph, train_nids_with_nodes_removed, sampler, ...)

That is, you remove the nodes you don’t care about before constructing the DataLoader.

yichuan520030910320 · August 14, 2023, 3:09pm

Ah, this is not what I wanted to do. What I want to do is to make some graph changes based on certain properties of the first layer. These properties cannot be known from the train_id. Is there any way to handle the blocks to make some graph changes, or can it only be modified inside the dataloader?

yichuan520030910320 · August 14, 2023, 3:53pm

if I want to have the block that is not fused? does that mean I need to modify _CAPI_DGLToBlock?

BarclayII · August 16, 2023, 6:23am

I see. I guess in this case you will have to write your own sampler (i.e. replace dgl.dataloading.NeighborSampler with your own, taking care of node removal there).

system · September 15, 2023, 6:24am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.