Convolutions with mini-batches of heterogeneous graph

mufeili · October 3, 2020, 11:57am

You can safely replace self.rgcn.forward(blocks, h) by self.rgcn(blocks, h). This is because self.rgcn inherits torch.nn.Module, which invokes forward(self, ...) in __call__(self, ...).

I have defined the self.predictor as an instance of the ScorePredictor , but I’ve got the feeling that this is not going to work for heterographs (with multiply node types). Do I have to iterate through the node types here?

For a heterogeneous graph with multiple node types, you can check whether x is a dictionary mapping node types to the corresponding features. If so, this should work. For link prediction, ScorePredictor iterates over all canonical edge types and implicitly iterates over the associated node types. Meanwhile, it’s possible that you don’t want to perform link prediction for all edge types but only for a subset of edge types.

And to what does the x refer to in edge_subgraph.ndata['x'] ? Does it have to be changed to h_dst which is used in the CustomHeteroGraphConv ?

For link prediction, you need to consider the updated representations of both source nodes and destination nodes, hence x is the updated feature for all nodes from the GNN.

Should the forward() function of the BaseRGCN then additionally take h as an input and not construct it in the function? (I commented out the lines that were used before):

It really depends on how you want to perform modeling, i.e. whether you want to pre-process it before invoking message passing.

sopkri · October 5, 2020, 7:50am

@mufeili Thank you for your response!

h is defined as follows, meaning a dictionary mapping n ode types to features, as you said.

h = {
            ntype: blocks[0].srcdata[ntype]
            for ntype in blocks.ntypes
        }

Why is it that only the srcdata and not the dstdata is used here?

Could you explain to me why there is this distinction between source nodes and destination nodes for mini-batching? In the 6.5 Tutorial h_src is defined as all of the node features, and h_dst is defined as only a part of the node features, meaning they are overlapping. Could you explain why?

h_src = h
h_dst = h[:block.number_of_dst_nodes()]

mufeili · October 6, 2020, 2:58am

Why is it that only the srcdata and not the dstdata is used here?

As mentioned in the user guide, " If the features are stored in g.ndata , then the features can be loaded by accessing the features in blocks[0].srcdata , the features of input nodes of the first block, which is identical to all the necessary nodes needed for computing the final representations."

Could you explain to me why there is this distinction between source nodes and destination nodes for mini-batching? In the 6.5 Tutorial h_src is defined as all of the node features, and h_dst is defined as only a part of the node features, meaning they are overlapping. Could you explain why?

Assume you have a single GNN layer and you want to update some destination nodes with neighbor sampling. Now assume you have N destination nodes to update, if you sample M neighbors for each destination node, then the computation involves O(MN) nodes, which is significantly larger than N. This is why we want to handle the source and destination nodes separately.

Meanwhile, for full graph training, most nodes are simultaneously source and destination nodes so it’s fine not to handle them separately.

sopkri · October 6, 2020, 10:41am

sopkri:

def forward(self, blocks, h):
# def forward(self, g):
"""Forward function of BaseRGCN."""
        blocks = blocks.local_var()
        # g = g()
        #hs = {}
        #hs = self.embed_layer(g)
        for idx, layer in enumerate(self.layers):
            #hs = layer.forward(g, inputs=hs, layer_number=idx)
            h = layer.forward(blocks, inputs=h, layer_number=idx)
        return h

When I try to run this I get an error that local_var() cannot be applied to a List object, since the blocks is now forwarded instead of the single graph entity g:

I tried replace this in the forward() function of the RelGraphConvLayer as follows:

class RelGraphConvLayer(nn.Module):
    ....

    def forward(self, g, inputs):
       ...
        # g = g.local_var()
        if g.is_block:
            g = [g.local_var() for g in g]
        else:
            g = g.local_var

But this gives me the following error:

  File "/.../model/MiniBatchLinkPredict.py", line 119, in forward
    if g.is_block:
AttributeError: 'list' object has no attribute 'is_block'

sopkri · October 6, 2020, 10:54am

@mufeili Thank you for the explanation

mufeili · October 6, 2020, 5:59pm

For a list of blocks generated in this case, blocks[i] represents the computation dependency for the i-th graph conv layer. It is recommended to pass a single block to a graph conv layer at a time and let the full model controls that. Below is an example

class StochasticTwoLayerRGCN(nn.Module):
    def __init__(self, in_feat, hidden_feat, out_feat):
        super().__init__()
        self.conv1 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(in_feat, hidden_feat, norm='right')
                for rel in rel_names
            })
        self.conv2 = dglnn.HeteroGraphConv({
                rel : dglnn.GraphConv(hidden_feat, out_feat, norm='right')
                for rel in rel_names
            })

    def forward(self, blocks, x):
        # The first graph conv layer handles the first block
        x = self.conv1(blocks[0], x)
        # The second graph conv layer handles the second block
        x = self.conv2(blocks[1], x)
        return x

is_block is a property of DGLGraph, which can be found here. It determines whether the graph object is a block. The API doc probably should probably describe that.

sopkri · October 7, 2020, 9:10am

A block is basically = [DGLGraph, DGLGraph, DGLGraph, ...], where the number of subgraphs depends on the batch size, right? So as you show here, the item of the blocks that is used in each layer corresponds to the index of the layer - which poses a problem if there are more DGLGraphs in the blocks than we have number of layers.
What I would have thought is that you pass each item of the blocks into the first layer and then the information from these nodes of this subgraph gets propagated, meaning we would have to iterate through the blocks.
As my implementation of the forward() function is passing the blocks as a whole and iterating through the layers. Is it then safe to change blocks to blocks[0]? How can I be sure that all DGLGraphs from the blocks are being propagated?

class BaseRGCNHetero(nn.Module):
    ....
    def forward(self, blocks, h):
        for idx, layer in enumerate(self.layers):
            # h = layer.forward(blocks, inputs=h), change to:
            h = layer.forward(blocks[0], inputs=h)
        return h

sopkri · October 7, 2020, 10:19am

I have an error now which is truly puzzling. The error is happening in the forward() method of the CustomHeteroGraphConv when the linear embedding should be generated for each tensor of each node type here:

Let’s do some detective work: I’ll start at the beginning where I am passing h into the forward() function of the RelGraphConv, which is a dictionary mapping node type to node tensor:

>>> h
ParameterDict(
    (disease): Parameter containing: [torch.FloatTensor of size 24x116]
    (drug): Parameter containing: [torch.FloatTensor of size 50x116]
    (protein): Parameter containing: [torch.FloatTensor of size 372x116]
)

When I then pass the h (as inputs) to the self.conv which is the CustomHeteroGraphConv:

then the really puzzling thing happens: h here is only a tuple of tensors and no dictionary mapping node type to tensor anymore:

>>> h
(Parameter containing:
tensor([[ 0.2510, -0.1729,  0.2618,  ..., -0.0329,  0.0395,  0.2230],
       ...,
        [-0.1161, -0.0539, -0.0565,  ...,  0.2624, -0.1471,  0.2224]],
       requires_grad=True), tensor([[-0.1195,  0.1309, -0.0337,  ..., -0.0942,  0.0333, -0.1135],
      ...,
        [ 0.0428,  0.1437,  0.1553,  ...,  0.0253,  0.1175,  0.1290]],
       grad_fn=<SliceBackward>))

The logical consequence is now that I am getting an error when I try to access the h[ntype] in the forward() function of the CustomHeteroGraphConv:

Error:

  File "/.../model/CustomHeteroGraphConv.py", line 36, in forward
    block.dstnodes[ntype].data['h_dst'] = self.Vs(h[ntype])
TypeError: tuple indices must be integers or slices, not str

Do you know why it happens that h is not a dictionary anymore, but a tuple of tensors? I would assume they are the source and destination node tensors, but I am unsure why.

mufeili · October 7, 2020, 3:29pm

A block is basically = [DGLGraph, DGLGraph, DGLGraph, ...]

This should be a list of blocks rather than a block. A block is a special DGLGraph.

, where the number of subgraphs depends on the batch size, right?

The number of blocks depends on the number of GNN layers. To be more accurate, the list of blocks generated by MultiLayerNeighborSampler in each iteration represents the computation dependency for neighbor sampling in multiple GNN layers. For example, consider the figure below.

To update the red node with two GNN layers, we need to sample its neighbors V_1 as well as the neighbors of the nodes in V_1. This can be realized by a list of two blocks, where blocks[0] represents the edges from the yellow nodes to the green nodes and blocks[1] represents the edges from the green nodes to the red node.

You can then use two GNN layers, where the first GNN layer performs message passing on the edges from the yellow nodes to the green nodes and outputs the representations of the green nodes; the second GNN layer performs message passing on the edges from the green nodes to the red node and outputs the representation of the red node.

It seems that you are confused about the concept of blocks. Have you read overview, chapter 6, user guide and 6.1? Are you able to understand the concept by reading them?

mufeili · October 7, 2020, 4:21pm

As described in the API doc, the first value passed to a HeteroGraphConv, CustomHeteroGraphConv in your case, should take two arguments. The first argument is a DGLGraph object. The second argument should be a tensor representing the source node features or a pair of tensors representing the source and destination node features. When you pass a dictionary mapping node types to node features in the forward function of HeteroGraphConv, it will iterate through all canonical edge types and pass the corresponding node features to the corresponding conv module.

sopkri · October 9, 2020, 11:57am

Thank you @mufeili for the explanation!

The definition of the block is now clear to me, I had a logical error there.

I would have a follow-up question to the forward() function of the CustomHeteroGraphConv, which I am calling here:

The first object I am passing is indeed a DGLGraph object:

Block(num_src_nodes=24, num_dst_nodes=48, num_edges=20)

The second object I am passing is the dictionary mapping node type to feature tensor:

inputs
ParameterDict(
    (disease): Parameter containing: [torch.FloatTensor of size 24x280]
    (drug): Parameter containing: [torch.FloatTensor of size 50x280]
    (protein): Parameter containing: [torch.FloatTensor of size 372x280]
)

But this is not what arrives at the forward function, because when I go into the forward() function of CustomHeteroGraphConv, the second parameter h that is passed is actually a tuple of tensors:

h[0].shape, h[1].shape
(torch.Size([24, 280]), torch.Size([43, 280]))

And it seems to me that this is the tuple of source and destination node features that you were talking about, since :

{ntype: block.number_of_src_nodes(ntype) for ntype in block.srctypes}
{'disease': 23}
{ntype: block.number_of_dst_nodes(ntype) for ntype in block.dsttypes}
{'drug': 43}

So my question now is how to I have to adapt the forward() of the CustomHeteroGraphConv since it clearly does not receive the dictionary mapping from node type to node features and therefore cannot pass the corresponding node features when iterating through the edge types?

Assuming that all convolution modules could be the same since the feature length is not different for the node types, is there another way to adapt the forward() function?

mufeili · October 9, 2020, 3:40pm

You can find the implementation of HeteroGraphConv here. HeteroGraphConv is designed to handle the iteration over edge types and the dispatch of the node features corresponding to each edge type. That says, CustomHeteroGraphConv should only deal with a single edge type a time.

sopkri · October 12, 2020, 7:39am

@mufeili That makes sense, because the HeteroGraphConv is initialised with the dictionary mapping from edge type to CustomHeteroGraphConv.
Could you tell me if in this case, the implementation from Tutorial 6.5 is still correct ?
Because here, you would be iterating for every block over the edge types - but we only consider one edge type per block then, right?

for etype in g.canonical_etypes:
     ...

Also, could you tell me why we should be iterating through every node type here?
Since the h that I am receiving in the CustomHeteroGraphConv (as explained above) is a pair of tensors, there is no possibility to select h[ntype]. Can I therefore here (safely) not iterate through the g.ntypes and just set h_src, h_dst = h ?

for ntype in g.ntypes:
                h_src, h_dst = h[ntype]
                g.dstnodes[ntype].data['h_dst'] = self.Vs[ntype](h[ntype])
                g.srcnodes[ntype].data['h_src'] = h[ntype]

My idea was then to adapt it as follows :

h_src, h_dst = h
dstnodetype = block.dsttypes[0]
block.dstdata['h_dst'] = self.Vs[dstnodetype](h_dst)
block.srcdata['h_src'] = h_src

Yet I am getting errors with this since the number of features does not match the number of nodes:

  File "/.../CustomHeteroGraphConv.py", line 39, in forward
    block.srcdata['h_src'] = h_src
  File "/.../lib/python3.7/site-packages/dgl/view.py", line 81, in __setitem__
    self._graph._set_n_repr(self._ntid, self._nodes, {key : val})
  File "/.../lib/python3.7/site-packages/dgl/heterograph.py", line 3752, in _set_n_repr
    ' Got %d and %d instead.' % (nfeats, num_nodes))
dgl._ffi.base.DGLError: Expect number of features to match number of nodes (len(u)). Got 24 and 21 instead.

For background info, the g that I am passing to the self.conv, the block in the forward() function and the shape of the input features of h.

>>> block
Block(num_src_nodes=21, num_dst_nodes=17, num_edges=5)
>>> g
Block(num_src_nodes={'disease': 21, 'drug': 45, 'protein': 367},
      num_dst_nodes={'disease': 5, 'drug': 17, 'protein': 311},
      ...,)
>>> h[0].shape
torch.Size([24, 68])
>>> h[1].shape
torch.Size([17, 68])

Do you have an idea why there is this discrepancy between the number of features (24) which matches the size of the first tensor pair item h[0] (should be source features) and the number of source nodes (21) in the block?

sopkri · October 12, 2020, 10:32am

Question to the aggregator function of HeteroGraphConv:

def get_aggregate_fn(agg):
    """Internal function to get the aggregation function for node data
    generated from different relations.

    Parameters
    ----------
    agg : str
        Method for aggregating node features generated by different relations.
        Allowed values are 'sum', 'max', 'min', 'mean', 'stack'.

    Returns
    -------
    callable
        Aggregator function that takes a list of tensors to aggregate
        and returns one aggregated tensor.
    """
    if agg == 'sum':
        fn = th.sum
    elif agg == 'max':
        fn = lambda inputs, dim: th.max(inputs, dim=dim)[0]
    elif agg == 'min':
        fn = lambda inputs, dim: th.min(inputs, dim=dim)[0]
    elif agg == 'mean':
        fn = th.mean
    elif agg == 'stack':
        fn = None  # will not be called
    else:
        raise DGLError('Invalid cross type aggregator. Must be one of '
                       '"sum", "max", "min", "mean" or "stack". But got "%s"' % agg)
    if agg == 'stack':
        def stack_agg(inputs, dsttype):  # pylint: disable=unused-argument
            if len(inputs) == 0:
                return None
            return th.stack(inputs, dim=1)
        return stack_agg
    else:
        def aggfn(inputs, dsttype):  # pylint: disable=unused-argument
            if len(inputs) == 0:
                return None
# error is thrown here 
            stacked = th.stack(inputs, dim=0)
            return fn(stacked, dim=0)
        return aggfn

This throws me an error in stacked = th.stack(inputs, dim=0) since inputs is a dictionary of tensors and not a tensor.

File "/.../lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/.../LinkPredictHetero.py", line 197, in forward
    outputs = self.rgcn.forward(blocks, h)
  File "/.../BaseRGCNHetero.py", line 76, in forward
    h = layer.forward(blocks[idx], h=h) #layer_number=idx)
  File ".../MiniBatchLinkPredict.py", line 132, in forward
    hs = self.conv(g, inputs_src, mod_kwargs=wdict)
  File ".../lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/.../lib/python3.7/site-packages/dgl/nn/pytorch/hetero.py", line 179, in forward
    rsts[nty] = self.agg_fn(alist, nty)
  File "/.../lib/python3.7/site-packages/dgl/nn/pytorch/hetero.py", line 221, in aggfn
    stacked = th.stack(inputs, dim=0)
TypeError: expected Tensor as element 0 in argument 0, but got dict

Which tensor should be selected to be stacked here? Would it be right to select
stacked = th.stack([inputs[i][dsttype] for i in range(len(inputs))], dim=1) since dsttype is passed here as a parameter in the function?

mufeili · October 12, 2020, 2:20pm

That makes sense, because the HeteroGraphConv is initialised with the dictionary mapping from edge type to CustomHeteroGraphConv.
Could you tell me if in this case, the implementation from Tutorial 6.5 is still correct ?
Because here, you would be iterating for every block over the edge types - but we only consider one edge type per block then, right?
for etype in g.canonical_etypes:
   ...  
Also, could you tell me why we should be iterating through every node type here?
Since the h that I am receiving in the CustomHeteroGraphConv (as explained above) is a pair of tensors, there is no possibility to select h[ntype] . Can I therefore here (safely) not iterate through the g.ntypes and just set h_src, h_dst = h ?
for ntype in g.ntypes:
                h_src, h_dst = h[ntype]
                g.dstnodes[ntype].data['h_dst'] = self.Vs[ntype](h[ntype])
                g.srcnodes[ntype].data['h_src'] = h[ntype]

The implementation of CustomHeteroGraphConv in user guide 6.5 by itself is fine because it is not used in dgl.nn.HeteroGraphConv. Meanwhile I agree this can be confusing and we may improve that.

When using dgl.nn.HeteroGraphConv, you are right that there should not be an iteration over edge types and h[ntype].

h_src, h_dst = h
dstnodetype = block.dsttypes[0]
block.dstdata['h_dst'] = self.Vs[dstnodetype](h_dst)
block.srcdata['h_src'] = h_src

Why do you need the code below? You can safely assume that when iterating over the edge types in HeteroGraphConv, each conv module corresponding to an edge type has a unique destination node type.

dstnodetype = block.dsttypes[0]
self.Vs[dstnodetype](h_dst)

>>> block
Block(num_src_nodes=21, num_dst_nodes=17, num_edges=5)
>>> g
Block(num_src_nodes={'disease': 21, 'drug': 45, 'protein': 367},
      num_dst_nodes={'disease': 5, 'drug': 17, 'protein': 311},
      ...,)
>>> h[0].shape
torch.Size([24, 68])
>>> h[1].shape
torch.Size([17, 68])

How did you get h?

mufeili · October 12, 2020, 2:28pm

When using HeteroGraphConv, the conv module corresponding to each edge type should return a tensor as in the case of homogeneous graphs. I don’t think that is mentioned in the API doc and should be fixed.

sopkri · October 12, 2020, 3:01pm

Data attributes of source and destination nodes

mufeili:

Why do you need the code below? You can safely assume that when iterating over the edge types in HeteroGraphConv , each conv module corresponding to an edge type has a unique destination node type.
dstnodetype = block.dsttypes[0]
self.Vs[dstnodetype](h_dst)

I thought I would need to set the block.dstdata['h_dst'] and block.srcdata['h_src'], this is why I copied this definition from the Tutorial 6.5 and tried to adapt it. Is it correct to just set these as follows and just drop the 'vtype' specification that has been there before in block.dstnodes[vtype].data['h_dst']:

h_src, h_dst = h
block.dstdata['h_dst'] = h_dst
block.srcdata['h_src'] = h_src

h

The values of h that are printed here are from within the forward function of CustomHeteroGraphConv.
Btw, I had to adapt this implementation as follows:

class CustomHeteroGraphConv(nn.Module):
    def __init__(self, g, in_feats, out_feats):
        super().__init__()
        self.Ws = nn.ModuleDict()
        self.Vs = nn.ModuleDict()
        for etype in g.canonical_etypes:
            utype, rel, vtype = etype
            self.Ws[rel] = nn.Linear(in_feats[utype], out_feats[vtype])
        for ntype in g.ntypes:
            self.Vs[ntype] = nn.Linear(in_feats[ntype], out_feats[ntype])

    def forward(self, block, h):
        """Forward messages with mini-batch implementation

        :param block: blocks (for the mini-batches)
        :param h: features for the src nodes in the first block, dictionary mapping from node type to feature tensor
        :return:
        """
        with block.local_scope():
            #for ntype in block.ntypes:
           for etype in block.canonical_etypes:
                utype, rel , vtype = etype

                h_src, h_dst = h
                block.dstdata['h_dst'] = self.Vs[vtype](h_dst)
                # add [:block.num_src_nodes()] because number of features did not match number of nodes
                block.srcdata['h_src'] = h_src[:block.num_src_nodes()]  
                block.update_all(
                    fn.copy_u('h_src', 'm'), fn.mean('m', 'h_neigh'),
                    etype=etype, )
                block.dstnodes[vtype].data['h_dst'] = \
                    block.dstnodes[vtype].data['h_dst'] + \
                    self.Ws[rel](block.dstnodes[vtype].data['h_neigh']) # add rel to select module of Ws
            return {ntype: block.dstnodes[ntype].data['h_dst']
                    for ntype in block.dsttypes} # change from block.ntypes to block.dsttypes

sopkri · October 12, 2020, 3:09pm

@mufeili So you mean that the CustomHeteroGraphConv forward method should return a Tensor instead of a dictionary? Should then this return statement

be changed to something like this:

return g.dstnodes.data['h_dst']

?

mufeili · October 12, 2020, 3:48pm

h_src, h_dst = h
block.dstdata['h_dst'] = h_dst
block.srcdata['h_src'] = h_src

Yes, this looks fine to me.

The values of h that are printed here are from within the forward function of CustomHeteroGraphConv.
Btw, I had to adapt this implementation as follows:

What value did you pass for the node features in the forward function of CustomHeteroGraphConv and how did you get that value?

mufeili · October 12, 2020, 3:52pm

Probably

return g.dstdata['h_dst']

is enough.