Node type missing in convolution output from CustomHeteroGraphConv

Hi there!

I am using the CustomHeteroGraphConv as in the 6.5 Tutorial. I am getting an error since

Traceback (most recent call last):
  File "/home/.../.conda/envs/deeplink/lib/python3.8/contextlib.py", line 131, in __exit__
    self.gen.throw(type, value, traceback)
  File "/home/.../lib/python3.8/site-packages/dgl/heterograph.py", line 5219, in local_scope
    yield
  File "/home/.../MLPEdge.py", line 126, in forward_etype
    graph.apply_edges(self.apply_edges, etype=etype)
  File "/home/.../python3.8/site-packages/dgl/heterograph.py", line 4124, in apply_edges
    self._set_e_repr(etid, eid, edata)
  File "/home/.../python3.8/site-packages/dgl/heterograph.py", line 3900, in _set_e_repr
    raise DGLError('Expect number of features to match number of edges.'
dgl._ffi.base.DGLError: Expect number of features to match number of edges. Got 0 and 1 instead.

This happens in the apply_edges function of my MLPEdge class, which looks as follows:

    def apply_edges(
        self,
        edges,
    ):
        if 'h' in edges._src_data and 'h' in edges._dst_data:
            h_u = edges.src['h']
            h_v = edges.dst['h']
            score = self.linear_2(
                self.linear_1(
                    th.cat(
                        tensors=[h_u, h_v],
                        dim=1,
                    )))
            return {'edge_score': score}
        else:
            if th.cuda.is_available():
                device = th.device('cuda')
            else:
                device = th.device('cpu')
            
            return {'edge_score': th.tensor([], dtype=th.float, device=device)}

    def forward_etype(
        self,
        graph: dgl.DGLHeteroGraph,
        h: Dict[str, th.Tensor],
        etype: str,
    ):
        """Forward computation for the edge representation.

        Args:
            graph (dgl.DGLHeteroGraph): The positive graph.
            h (dict): The node feature for each node type.
            etype (str): The edge type to consider for calculating the edge representations.

        Returns:
            (dict): The new features for each edge type.
       """
        with graph.local_scope():
            graph.ndata['h'] = h  # assigns 'h' of all node types in one shot

            graph.apply_edges(self.apply_edges, etype=etype)

            return graph.edges[etype].data['edge_score']

    def forward(
        self,
        g: dgl.DGLHeteroGraph,
        h: Dict[str, th.Tensor],
    ) -> Dict[str, th.Tensor]:
        """Forward computation for the edge representation.

        Args:
            g (dgl.DGLHeteroGraph): The positive or negativeDGL heterograph.
            h (dict): The node feature for each node type.
            etype (str): The edge type to consider for calculating the edge representations.

        Returns:
            (dict): The new features for each edge type of the positive or negative graph.

       """
        edge_embedding = dict()
        # check which edge types are actually existent
        existent_etypes = [
            canonical_etype for canonical_etype in g.canonical_etypes
            if g.num_edges(canonical_etype[1]) > 0
        ]

        for etype in existent_etypes:

            edge_embedding.update(
                {
                    etype[1]: self.forward_etype(
                        graph=g,
                        h=h,
                        etype=etype[1],
                    )
                }
            )
        return edge_embedding

So the problem is, that there is an edge type which is existent, but cannot access the features.

I found out that the forward pass from the CustomHeteroGraphConv sometimes does not return all existent node types with features in the current block and I think this is the cause of the issue.

My CustomHeteroGraphConv looks like this:

class CustomHeteroGraphConv(nn.Module):
    def __init__(
        self,
        g: dgl.DGLHeteroGraph,
        in_feat: int,
        out_feat: int,
    ):
        """

        Args:
            g (dgl.DGLHeteroGraph): The current heterograph.
            in_feat (int): The input feature size, constant for all node types.
            out_feat (int): The output feature size, constant for all node types
        """
        super().__init__()
        self.Ws = nn.ModuleDict()
        self.Vs = nn.ModuleDict()
        for etype in g.canonical_etypes:
            utype, rel, vtype = etype
            self.Ws[rel] = nn.Linear(in_feat, out_feat)

        for ntype in g.ntypes:
            self.Vs[ntype] = nn.Linear(in_feat, out_feat)


    def forward(
        self,
        block: dgl.DGLHeteroGraph,
        h: Tuple[th.Tensor, th.Tensor],
    ) -> Dict[str, th.Tensor]:
        """Forward messages with mini-batch implementation.

        Args:
            block (dgl.DGLHeteroGraph): A block of the graph.
            h (dict): The features for the src and dst nodes in the first block for the source and destination node
                type of the current edge type.
        Returns:
            (dict): The mapping from node type to the convolved features.
        """
        with block.local_scope():
            for ntype in block.ntypes:
                h_src, h_dst = h
                block.dstdata['h_dst'] = self.Vs[ntype](h_dst)  # h_dst
                block.srcdata['h_src'] = h_src

            for etype in block.canonical_etypes:
                utype, rel , vtype = etype
                block.update_all(
                    fn.copy_u('h_src', 'm'), fn.mean('m', 'h_neigh'),
                    etype=etype)
              
                block.dstdata['h_dst'] = block.dstdata['h_dst'] + self.Ws[rel](block.dstdata['h_neigh'])

         
            return {
                ntype: block.dstnodes[ntype].data['h_dst']
                for ntype in block.dsttypes
            }

Note: this occurs only in the testing after the model has been trained. The training graph sand testing graph are different in that sense that the training graph has all edge types, but the testing graph has only some edge types.

Could you help me solve this issue?

Note: this occurs only in the testing after the model has been trained. The training graph sand testing graph are different in that sense that the training graph has all edge types, but the testing graph has only some edge types.

It sounds like you can check if an edge type exists and handle it accordingly. Nevertheless, it’s still quite strange that the graphs have different edge types and I’m not sure if this can lead to significant performance drop.

I still do not know why this issue occurred, but I tried a different strategy which resulted in the error not occurring again:

Before, I had a test graph which only had one type of all edge types in it, in contrast to the training and validation graph which had all edge types in it. I figured the only difference between those graphs, and why it works in the training, but not in the testing is because of the characteristics of the graphs. So I tried it with a new data split that had all edge types also in the test graph. And it worked. So it might be that it can somehow not calculate the proper features if it does not have the same number of edge types in the test graph as in the training graph.