Batching Heterogeneous graphs with different edge types

Hi,

I’ve been facing a weird problem in DGL heterogeneous processing with errors from the underlying C++ libraries. Boiling that the issue, I could regenerate it using the below snippet.

import dgl

g1 = dgl.heterograph({('0', '0_1', '1'): [(0, 1), (1, 0)], ('0', '0_0', '0'): [(0, 1), (1, 0)]})
g2 = dgl.heterograph({('0', '0_1', '1'): [(1, 0), (1, 2)]})
dgl.batch([g1, g2])

The output would be like this:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/batch.py", line 177, in batch
    gidx = disjoint_union(graphs[0]._graph.metagraph, [g._graph for g in graphs])
  File "/projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/heterograph_index.py", line 1254, in disjoint_union
    return _CAPI_DGLHeteroDisjointUnion_v2(metagraph, graphs)
  File "dgl/_ffi/_cython/./function.pxi", line 293, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 225, in dgl._ffi._cy3.core.FuncCall
  File "dgl/_ffi/_cython/./function.pxi", line 215, in dgl._ffi._cy3.core.FuncCall3
dgl._ffi.base.DGLError: [20:49:06] /opt/dgl/src/graph/unit_graph.cc:1211: Check failed: mat.num_rows == mat.num_cols (4 vs. 5) :
Stack trace:
  [bt] (0) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f511326beaf]
  [bt] (1) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(dgl::UnitGraph::CreateFromCOO(long, dgl::aten::COOMatrix const&, unsigned char)+0x28c) [0x7f511370064c]
  [bt] (2) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(dgl::DisjointUnionHeteroGraph2(std::shared_ptr<dgl::GraphInterface>, std::vector<std::shared_ptr<dgl::BaseHeteroGraph>, std::allocator<std::shared_ptr<dgl::BaseHeteroGraph> > > const&)+0x5a1) [0x7f51136efc31]
  [bt] (3) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(+0x6c0cdf) [0x7f511361acdf]
  [bt] (4) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(+0x6c1344) [0x7f511361b344]
  [bt] (5) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7f5113594398]
  [bt] (6) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x1632c) [0x7f50dda2f32c]
  [bt] (7) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x1685b) [0x7f50dda2f85b]
  [bt] (8) python(_PyObject_MakeTpCall+0x22f) [0x55aa6525785f]

I think the main problem here is that the second graph is missing the edge type of (‘0’, ‘0_0’, ‘0’). Therefore, while creating the batch graph, DGL mistakes in matching the dimensions. Isn’t this something that DGL should be able to handle? Not sure whether I’m doing something wrong or that’s actually a bug.

I would appreciate any help with it!

Turns out you have to have the same metagraph for all graphs to do the batching. Fixing it solved the problem.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.