Subgraphing and batching Heterographs

pcrocker · February 27, 2021, 7:42pm

I am trying to implement simple DiffPooling style network but with pre-defined clusters and layers. I am having trouble calling conv layers on subgraphs of batched heterographs.

The setup:

data_dict[('A', 'AA0', 'A')] = ([0,1,2,3],[1,0,3,2])
data_dict[('A', 'AB0', 'B')] = ([0,1],[0,0])
data_dict[('A', 'AB1', 'B')] = ([2,3],[1,1])
data_dict[('B', 'BC0', 'C')] = ([0,1],[0,0])
num_nodes_dict = {'A':4, 'B': 2, 'C':1}
base_graph = dgl.heterograph(data_dict, num_nodes_dict)

The graph is like a tree with the leaf nodes being type A, the middle nodes type B, and the root type C. I want to preform a series of convolutions on a batch of these graphs. Note these batched graphs are instance attributes to the model. The data coming in is a batch of A type features. Here is what I have so far:

Build Batched Subgraphs by Type

from copy import copy
batch_size = 2
aa_subgraph = dgl.batch([copy(base_graph.edge_type_subgraph(['AA0'])) for _ in range(batch_size)])

ab_subgraph = dgl.batch([copy(base_graph.edge_type_subgraph(['AB0','AB1'])) for _ in range(batch_size)])

bc_subgraph = dgl.batch([copy(base_graph.edge_type_subgraph(['BC0'])) for _ in range(batch_size)])

Build Conv Layers

aa_conv = dgl.nn.HeteroGraphConv({'AA0':dgl.nn.GraphConv(2,4,allow_zero_in_degree=True)})
ab_conv = dgl.nn.HeteroGraphConv({etype:dgl.nn.GATConv((4,8),8,1) for etype in ['AB0','AB1']})
bc_conv = dgl.nn.HeteroGraphConv({'BC0':dgl.nn.GATConv((8,16),16,1)})

Data Comes in
Features are the features of ntype A

a_feats = torch.randn((2,4,2))

Forward

a_feats = aa_conv(aa_subgraph, {'A':a_feats})['A']
# ERROR ON THE ABOVE LINE
b_feats = ab_conv(ab_subgraph, ({'A':a_feats, },{'B':torch.zeros((2,2,8))}))['B']
c_feats = bc_conv(bc_subgraph, ({'B':b_feats},{'C':torch.zeros((2,1,16))}))['C']

return c_feats

Error
The size of tensor a (2) must match the size of tensor b (8) at non-singleton dimension 0

While I put more info here needed than it takes to replicate the problem, I mostly wanted to ask if this was the recommended approach to a problem like this? Are there better ways to subgraph a batch of heterographs? Why is my conv_layer splitting up my batch internally? I am splitting my tree graph into a cascade of bipartite graphs. Is there a native way to do this or am I doing it correct by going step by step?

Thank you all in advance

pcrocker · February 27, 2021, 10:45pm

To add to this, the heterograph batching example here fails for me on 0.6.0 with error

DGLError: Node type name must be specified if there are more than one node types.

pcrocker · February 28, 2021, 2:31am

I found a way around this but I am not sure if it is best practice. I essentially just flatten the batch dimension and have to .view() the tensor after each step in my forward.

Forward
a_feats = a_feats.view((8,2))
a_feats = aa_conv(aa_subgraph, {‘A’:a_feats})[‘A’]
a_feats = a_feats.squeeze() # Or .view here but I am always using 1 channel on my GAT

b_feats = ab_conv(ab_subgraph, ({‘A’:a_feats, },{‘B’:torch.zeros((4,8))}))[‘B’]
b_feats = b_feats.squeeze()

c_feats = bc_conv(bc_subgraph, ({‘B’:b_feats},{‘C’:torch.zeros((2,16))}))[‘C’]
c_feats = c_feats.view((batch_size, 1, 16))

return c_feats

mufeili · March 1, 2021, 5:23am

Why did you use copy? dgl.batch returns a new graph.
If you want to apply graph convolution to multiple edge types sequentially, then you don’t need HeteroGraphConv, just do something like GNNConv()(base_graph['AA0'], ...).

mufeili · March 1, 2021, 7:11am

Which example? I tried the example for heterogeneous graphs in dgl.batch with 0.6 and it worked well.

pcrocker · March 2, 2021, 4:37pm

Good point I don’t need copy as I am not modifying the graph on my conv layers and batch returns the new graph. Thanks!
When I do etype AB0 and AB1 on the next layer, I use a HeteroGraphConv() and called it ab_conv. The subgraph found by graph.edge_type_subgraph([...]) has multiple edge types and I want to do them all in parallel. Is a HeteroGraphConv() needed in this case? I used a HeteroGraphConv for aa_conv and bc_conv for consistancy reasons even though they each only deal with one edge type. Is this okay?

pcrocker · March 2, 2021, 4:45pm

When I ran the last batch example:

import torch as th
import dgl
import sys
print('python version: ', sys.version)
print('torch version: ', th.__version__)
print('dgl version: ', dgl.__version__)

hg1 = dgl.heterograph({
    ('user', 'plays', 'game') : (th.tensor([0, 1]), th.tensor([0, 0]))})
hg2 = dgl.heterograph({
    ('user', 'plays', 'game') : (th.tensor([0, 0, 0]), th.tensor([1, 0, 2]))})
bhg = dgl.batch([hg1, hg2])

print('size: ', bhg.batch_size)
print('num_nodes: ', bhg.batch_num_nodes())
print('num_edges: ', bhg.batch_num_edges())

I got output:

python version:  3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) 
[GCC 7.3.0]
torch version:  1.7.1
dgl version:  0.6.0
size:  2

and then error:

---------------------------------------------------------------------------
DGLError                                  Traceback (most recent call last)
<ipython-input-20-580327dc4706> in <module>
     13 
     14 print('size: ', bhg.batch_size)
---> 15 print('num_nodes: ', bhg.batch_num_nodes())
     16 print('num_edges: ', bhg.batch_num_edges())

~/anaconda3/envs/pytorch_latest_p36/lib/python3.6/site-packages/dgl/heterograph.py in batch_num_nodes(self, ntype)
   1284         if ntype is None:
   1285             if len(self.ntypes) != 1:
-> 1286                 raise DGLError('Node type name must be specified if there are more than one '
   1287                                'node types.')
   1288             ntype = self.ntypes[0]

DGLError: Node type name must be specified if there are more than one node types.

mufeili · March 3, 2021, 6:16am

When I do etype AB0 and AB1 on the next layer, I use a HeteroGraphConv() and called it ab_conv . The subgraph found by graph.edge_type_subgraph([...]) has multiple edge types and I want to do them all in parallel. Is a HeteroGraphConv() needed in this case? I used a HeteroGraphConv for aa_conv and bc_conv for consistancy reasons even though they each only deal with one edge type. Is this okay?

I assume you did something like graph.edge_type_subgraph(['interacts']) while graph has multiple canonical edge types of form (*, 'interacts', *). DGL only allows graph.edge_type_subgraph([('A', 'interacts', 'B'), ('C', 'interacts', 'D'), ...]) in that case.

If you want to learn one model per canonical edge type, you can initialize your model with model = HeteroGraphConv({('A', 'interacts', 'B'): model1, ('C', 'interacts', 'D'): model2, ...}). During forward computation, the HeteroGraphConv object will perform computation for all canonical edge types in parallel.

If you want to learn a shared model for all such canonical edge types, you don’t need HeteroGraphConv. Just initialize a model and loop over the canonical edge types in forward computation.

mufeili · March 3, 2021, 6:19am

When a DGLGraph has multiple node types and edge types, you need to do bhg.batch_num_nodes(node_type) and bhg.batch_num_edges(edge_type).

system · April 2, 2021, 6:20am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.