Edge subgraph for Heterograph fails

Hi there!

I tried to use the dgl.edge_subgraph (documentation here) to create a subgraph from edges (specifically, and not from nodes) of my entire heterograph. Yet this error is thrown:

  File "/.../train_val_test_split.py", line 999, in single_train_test_split
    train_subgraph = dgl.edge_subgraph(
  File "/.../python3.8/site-packages/dgl/subgraph.py", line 279, in edge_subgraph
    sgi = graph._graph.edge_subgraph(induced_edges, preserve_nodes)
  File "/.../python3.8/site-packages/dgl/heterograph_index.py", line 824, in edge_subgraph
    return _CAPI_DGLHeteroEdgeSubgraph(self, eids, preserve_nodes)
  File "dgl/_ffi/_cython/./function.pxi", line 287, in dgl._ffi._cy3.core.FunctionBase.__call__
  File "dgl/_ffi/_cython/./function.pxi", line 222, in dgl._ffi._cy3.core.FuncCall
  File "dgl/_ffi/_cython/./function.pxi", line 211, in dgl._ffi._cy3.core.FuncCall3
  File "dgl/_ffi/_cython/./base.pxi", line 155, in dgl._ffi._cy3.core.CALL
dgl._ffi.base.DGLError: [11:52:57] /opt/dgl/src/array/cpu/array_index_select.cc:22: Check failed: idx_data[i] < arr_len (315 vs. 277) : Index out of range.
Stack trace:
  [bt] (0) /.../python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fdc06bf8daf]
  [bt] (1) /..../python3.8/site-packages/dgl/libdgl.so(dgl::runtime::NDArray dgl::aten::impl::IndexSelect<(DLDeviceType)1, long, long>(dgl::runtime::NDArray, dgl::runtime::NDArray)+0x1aa) [0x7fdc06c1468a]
  [bt] (2) ..../python3.8/site-packages/dgl/libdgl.so(dgl::aten::IndexSelect(dgl::runtime::NDArray, dgl::runtime::NDArray)+0xbd2) [0x7fdc06be4142]
  [bt] (3) /.../python3.8/site-packages/dgl/libdgl.so(dgl::UnitGraph::COO::EdgeSubgraph(std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> > const&, bool) const+0x506) [0x7fdc073f86b6]
  [bt] (4) .../python3.8/site-packages/dgl/libdgl.so(dgl::UnitGraph::EdgeSubgraph(std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> > const&, bool) const+0x69) [0x7fdc073ed699]
  [bt] (5) /.../python3.8/site-packages/dgl/libdgl.so(dgl::HeteroGraph::EdgeSubgraph(std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> > const&, bool) const+0x449) [0x7fdc07318f89]
  [bt] (6) /.../python3.8/site-packages/dgl/libdgl.so(+0xc5c9ab) [0x7fdc073289ab]
  [bt] (7) /.../python3.8/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fdc072b7f88]
  [bt] (8) ...python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x16f99) [0x7fdbf8686f99]

Do you know why this could happen? I created the edge dictionary as follows:

# collector for edge ids for new subgraphs
    edge_id_train_collector = dict()
    edge_id_test_collector = dict()

    # get edge id array for every edge type
    for canonical_etype in heterograph.g.canonical_etypes:

        # edge ids are specific to the edge type, all starting at 0
        edge_ids = heterograph.g.edges(form='eid', etype=canonical_etype)

        # shuffle and split edge id tensors
        train_edges, test_edges = sklearn.model_selection.train_test_split(
            edge_ids,
            test_size=test_size,
            train_size=train_size,
            random_state=seed,

        )
        edge_id_train_collector[canonical_etype] = train_edges
        edge_id_test_collector[canonical_etype] = test_edges

    # create edge subgraph
    train_subgraph = dgl.edge_subgraph(
        graph=dgl_heterograph,
        edges=edge_id_train_collector,
        preserve_nodes=True,
    )

Note: The documentation (version 6.0) says that the input for the edges parameter should me a dictionary mapping edge types to nodes. Is this correct? Or should it rather be the edge ids as values?

If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the nodes.

Can you provide a code snippet for reproducing the issue?

Note: The documentation (version 6.0) says that the input for the edges parameter should me a dictionary mapping edge types to nodes. Is this correct? Or should it rather be the edge ids as values?

You are right. This should be addressed in this PR.

@mufeili I managed to solve this issue and I think the problem was in the edges parameter. I had so sort the edge ids:

edge_id_train_collector[canonical_etype] = sorted(train_edges)
edge_id_test_collector[canonical_etype] = sorted(test_edges)
1 Like