Node Classification on Hetero-Graphs using Node Features

TasmanGC · October 19, 2021, 3:36am

Hi @mufeili thanks for your past guidance. I come now with new questions.

Context
I’m interested in using a single GNN to perform binary node classification across multiple heterographs.

I have a set of ~4000 bipartite graphs, with nodes a and b each with 2000 connections.
These nodes have different feature spaces (a’ and b’) but share a common label, which is binary.

I’m trying to perform binary classification on the nodes in these graphs.
So that my model can then predict labels on new unseen graphs where there are no labels but all node features are present.

I’ve read the discussion on node classification over multiple graphs found here.
As a result I have experimented with batching and the custom collate functions as necessary.
However I’ve failed to translate that training practice to heterographs.

This is complicated by a lack of examples of heterographs that use node features as apposed to learnable embeddings.

Questions
1 - How do I alter the HeteroRGCN model here to allow for the usage of node features?
2 - Should I be batching my heterographs or simply iterating over the graph dataset?
3 - Given the graph structure below what’s the best way to structure my training loop? Are there any good examples I’ve missed?
4 - Do you have any recommended readings for my setting? I’ve discovered 1 and 2 so far.

Heterograph Construction

This is all wrapped up in a DGLDataset object, such that get_item returns dgl_hetero_graph.

dgl_hetero_graph = dgl.heterograph({('a, ‘link’, ‘b’): (u, v)})
dgl_hetero_graph.nodes[‘a’].data[‘a_feat’] = torch.tensor(‘a_feat’), dtype=torch.float32)
dgl_hetero_graph.nodes[‘a’].data[‘a_labl’] = torch.tensor(‘a_labl’), dtype=torch.float32)
dgl_hetero_graph.nodes[‘b’].data[‘b_feat’] = torch.tensor(‘b_feat’), dtype=torch.float32)
dgl_hetero_graph.nodes[‘b’].data[‘b_labl’] = torch.tensor(‘b_labl’), dtype=torch.float32)

Thanks again

TasmanGC · October 20, 2021, 3:42am

Building on this I’m fairly confident I’m making the correct node feature dictionary now.
Such that the keys are ntypes of the heterograph.

However, I’m uncertain what is happening in this block.

      funcs[etype] = (fn.copy_u('Wh_%s' % etype, 'm'), fn.mean('m', 'h'))
G.multi_update_all(funcs, 'sum')
# return the updated node feature dictionary
return {ntype : G.nodes[ntype].data['h'] for ntype in G.ntypes}

I’m getting a key error for ‘h’.
I thought that this would occur during the G.multi_update_all call.

VoVAllen · October 20, 2021, 5:02am

Is ‘h’ missing in all node type or only certain node types?

TasmanGC · October 20, 2021, 5:59am

Ooh! What an excellent question!
Before running the model the answer is neither, however after the forward call.

# node type a
G.nodes['a'].data.keys()
dict_keys(['a_data', 'h'])

# where for node type b
G.nodes['b'].data.keys())
dict_keys(['b_data', 'Wh_link'])

Why does each node type only get one of the keys?
Is this because of how I’ve constructed the graph, should I construct my graph as below?

dgl_hetero_graph = dgl.heterograph({('a', 'link', 'b'): (u, v), ('b', 'link', 'a'): (v, u)})

VoVAllen · October 21, 2021, 5:52am

For bipartite graph, may only the destination type got the reduced feature from the source type? I’m not 100% sure, maybe you can also try adding reverse edges to the same etype graph, such as passing (th.cat([u,v]), th.cat([v,u]))