Hi,
I’ve been facing a weird problem in DGL heterogeneous processing with errors from the underlying C++ libraries. Boiling that the issue, I could regenerate it using the below snippet.
import dgl
g1 = dgl.heterograph({('0', '0_1', '1'): [(0, 1), (1, 0)], ('0', '0_0', '0'): [(0, 1), (1, 0)]})
g2 = dgl.heterograph({('0', '0_1', '1'): [(1, 0), (1, 2)]})
dgl.batch([g1, g2])
The output would be like this:
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "/projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/batch.py", line 177, in batch
gidx = disjoint_union(graphs[0]._graph.metagraph, [g._graph for g in graphs])
File "/projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/heterograph_index.py", line 1254, in disjoint_union
return _CAPI_DGLHeteroDisjointUnion_v2(metagraph, graphs)
File "dgl/_ffi/_cython/./function.pxi", line 293, in dgl._ffi._cy3.core.FunctionBase.__call__
File "dgl/_ffi/_cython/./function.pxi", line 225, in dgl._ffi._cy3.core.FuncCall
File "dgl/_ffi/_cython/./function.pxi", line 215, in dgl._ffi._cy3.core.FuncCall3
dgl._ffi.base.DGLError: [20:49:06] /opt/dgl/src/graph/unit_graph.cc:1211: Check failed: mat.num_rows == mat.num_cols (4 vs. 5) :
Stack trace:
[bt] (0) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7f511326beaf]
[bt] (1) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(dgl::UnitGraph::CreateFromCOO(long, dgl::aten::COOMatrix const&, unsigned char)+0x28c) [0x7f511370064c]
[bt] (2) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(dgl::DisjointUnionHeteroGraph2(std::shared_ptr<dgl::GraphInterface>, std::vector<std::shared_ptr<dgl::BaseHeteroGraph>, std::allocator<std::shared_ptr<dgl::BaseHeteroGraph> > > const&)+0x5a1) [0x7f51136efc31]
[bt] (3) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(+0x6c0cdf) [0x7f511361acdf]
[bt] (4) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(+0x6c1344) [0x7f511361b344]
[bt] (5) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7f5113594398]
[bt] (6) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x1632c) [0x7f50dda2f32c]
[bt] (7) /projects/ovcare/classification/ramin/virtualenv/histopath_gnn/slurm/lib/python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x1685b) [0x7f50dda2f85b]
[bt] (8) python(_PyObject_MakeTpCall+0x22f) [0x55aa6525785f]
I think the main problem here is that the second graph is missing the edge type of (‘0’, ‘0_0’, ‘0’). Therefore, while creating the batch graph, DGL mistakes in matching the dimensions. Isn’t this something that DGL should be able to handle? Not sure whether I’m doing something wrong or that’s actually a bug.
I would appreciate any help with it!