How to get the neignbor of each node in minibatch or how to handle the block graph

When using DGL, if we use the following method to obtain a minibatch:

pythonCopy code

for it, (input_nodes, output_nodes, blocks) in enumerate(train_dataloader):

blocks[0].dstdata['_ID'] and blocks[0].srcdata['_ID'] both yield results after ‘fusion’ (de-duplication by DGL). How can I know which neighbors each node in blocks[0].srcdata['_ID'] is sampled from, without getting the de-duplicated results? Or is there a way to remove some specific nodes from the bipartite graph blocks[0]? I only want to retain some nodes in blocks[0].dstdata and want to remove other nodes and their corresponding sources. Does anyone have any ideas?

You can’t do so outside the DataLoader. One thing you could try is to customize the training node IDs you feed into the DataLoader: instead of

train_dataloader = DataLoader(graph, train_nids, sampler, ...)

How about

train_dataloader = DataLoader(graph, train_nids_with_nodes_removed, sampler, ...)

That is, you remove the nodes you don’t care about before constructing the DataLoader.

Ah, this is not what I wanted to do. What I want to do is to make some graph changes based on certain properties of the first layer. These properties cannot be known from the train_id. Is there any way to handle the blocks to make some graph changes, or can it only be modified inside the dataloader?

if I want to have the block that is not fused? does that mean I need to modify _CAPI_DGLToBlock? :frowning:

I see. I guess in this case you will have to write your own sampler (i.e. replace dgl.dataloading.NeighborSampler with your own, taking care of node removal there).

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.