Sample the features of a target Node's Neighborhood from a Block (heterograph origin)

TasmanGC · April 8, 2022, 4:26pm

My Problem

I’ve been working from the stochastic training example we all know and love. As such, I have a block generated by this little number. The train_graph is a hetero graph, two node types (a,b) and one connection type (link).

sampler     = MultiLayerFullNeighborSampler(2)
trainloader = NodeDataLoader(train_graph, train_nodes, sampler, batch_size=b_size, shuffle=False, drop_last=False, device=device)

My goal

What I’m wanting to do is load a feature (let’s call it feat) from all the neighbours of a given destination node in a given block. This is ultimately so I can apply a weighting to my custom loss function based the variation of the feat between the given nodes neighbors. The function I need is something along the lines of:

def my_dream_func(block,feat,(dst_id:ntype)):
	magic
	return({'neigh_feat':[horizon_feats]})

My attempt

The code below, works for one sample(if I just look at one block from the data loader). However, when I iterate over it. Boom Win Error. As you can no doubt see, it seems really inefficent, get a block, then turn it into a graph, then get a subgraph, then make a block…

for _,_,blocks in trainloader:
    b_graph = dgl.block_to_graph(blocks[-1]).cpu()
    for dtype in ['a_dst','b_dst']:
        if b_graph.num_nodes(dtype)!=0:
            weights = []
            for id in b_graph.nodes[dtype].data['_ID']:
                frontier = dgl.in_subgraph(b_graph, {dtype:id})
                block_copy = dgl.to_block(frontier, {dtype:id})
                src_data = block_copy.srcdata
                for n_type in src_data['feat'].keys():
                    type_labels = src_data['feat'][n_type].detach().numpy()

My Question

Is there an easy obvious way to get a feature from all neighbors of a given node, from a block, generated from a heterograph?

Notes

I can provide details of the error but I suspect I’m just missing a more obvious solution.
I was previously working with GPU, but I’ve tested on just the cpu and it didn’t resolve my issue.

BarclayII · April 11, 2022, 6:12am

Do you want to retrieve the neighbors of one single node from a block? You can first find the nodes with block.in_edges (which has the same usage as g.in_edges):

# say your node ID is v
neighbor_ids = block.in_edges(v, etype=your_edge_type)
neighbor_features = block.srcdata['feat'][neighbor_ids]

TasmanGC · April 11, 2022, 10:11am

Hey! Thanks that is a nice alternative, a bit of shuffling around is needed as the returned tensor neighbor_ids can’t be used to index the dictionary from block.srcdata[‘feat’] (given the heterograph source of the block). That said your proposed method would work much more nicely.

However, much like my earlier attempts I can run it once but not iteratively. Specifically I tested by calling the in_edges or out_edges method on blocks returned from a dataloader as in the code below.

Note: I generate the v value from the block itself to ensure the node exists in that block. I’ve also confirmed that every block contains at least one node of type a.

This works

_, _, blocks = next(iter(trainloader))
blocks[-1].in_edges(blocks[-1].ndata['_ID']['a'][0], etype=('a','link','b'))

This Doesn’t

for _,_,blocks in tqdm(trainloader):
    blocks[-1].in_edges(blocks[-1].ndata['_ID']['a'][0], etype=('a','link','b'))

As a result I get OSError: [WinError -529697949] Windows Error 0xe06d7363, after only a single execution.

It might be linked to the bug below, but I’m working on a minimal example to confirm.
[Bug] in querying Heterograph structure · Issue #3854 · dmlc/dgl (github.com)

TasmanGC · April 11, 2022, 10:25am

Based on the minimal example in the above bug link, that maybe I’m using the wrong v value? As I can replicate the bug from the post, and in that case it was linked to an out of bounds error.

BarclayII · April 11, 2022, 11:12am

No you don’t need the shuffling, because the IDs passed to blocks[-1].in_edges should be the IDs within the block, while blocks[-1].ndata['_ID'] represents the IDs in the original graph.

If you want to find which node in the block corresponds to the given node ID v in the original graph, you will need to do an inverse mapping instead, that is, find which element in blocks[-1].dstdata['_ID'] (or the second element of next(iter(trainloader)), which is the same) has the value v.

TasmanGC · April 11, 2022, 12:27pm

Oooh! Thats good to know. How are the new ID values generated for a given block?

BarclayII · April 11, 2022, 1:21pm

I don’t think there’s a rule to map the original node ID to the new ID (it’s done in C++'s std::unordered_map if you care). But essentially g.srcdata['_ID'] and g.dstdata['_ID'] shows how block’s source node ID and destination node ID are mapped to the original graph, so you can take the inverse to figure out how the IDs are mapped the other way around.

Regarding to the error, I think that’s the same issue as you mentioned.