Hi there!
I tried to use the dgl.edge_subgraph
(documentation here) to create a subgraph from edges (specifically, and not from nodes) of my entire heterograph. Yet this error is thrown:
File "/.../train_val_test_split.py", line 999, in single_train_test_split
train_subgraph = dgl.edge_subgraph(
File "/.../python3.8/site-packages/dgl/subgraph.py", line 279, in edge_subgraph
sgi = graph._graph.edge_subgraph(induced_edges, preserve_nodes)
File "/.../python3.8/site-packages/dgl/heterograph_index.py", line 824, in edge_subgraph
return _CAPI_DGLHeteroEdgeSubgraph(self, eids, preserve_nodes)
File "dgl/_ffi/_cython/./function.pxi", line 287, in dgl._ffi._cy3.core.FunctionBase.__call__
File "dgl/_ffi/_cython/./function.pxi", line 222, in dgl._ffi._cy3.core.FuncCall
File "dgl/_ffi/_cython/./function.pxi", line 211, in dgl._ffi._cy3.core.FuncCall3
File "dgl/_ffi/_cython/./base.pxi", line 155, in dgl._ffi._cy3.core.CALL
dgl._ffi.base.DGLError: [11:52:57] /opt/dgl/src/array/cpu/array_index_select.cc:22: Check failed: idx_data[i] < arr_len (315 vs. 277) : Index out of range.
Stack trace:
[bt] (0) /.../python3.8/site-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x4f) [0x7fdc06bf8daf]
[bt] (1) /..../python3.8/site-packages/dgl/libdgl.so(dgl::runtime::NDArray dgl::aten::impl::IndexSelect<(DLDeviceType)1, long, long>(dgl::runtime::NDArray, dgl::runtime::NDArray)+0x1aa) [0x7fdc06c1468a]
[bt] (2) ..../python3.8/site-packages/dgl/libdgl.so(dgl::aten::IndexSelect(dgl::runtime::NDArray, dgl::runtime::NDArray)+0xbd2) [0x7fdc06be4142]
[bt] (3) /.../python3.8/site-packages/dgl/libdgl.so(dgl::UnitGraph::COO::EdgeSubgraph(std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> > const&, bool) const+0x506) [0x7fdc073f86b6]
[bt] (4) .../python3.8/site-packages/dgl/libdgl.so(dgl::UnitGraph::EdgeSubgraph(std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> > const&, bool) const+0x69) [0x7fdc073ed699]
[bt] (5) /.../python3.8/site-packages/dgl/libdgl.so(dgl::HeteroGraph::EdgeSubgraph(std::vector<dgl::runtime::NDArray, std::allocator<dgl::runtime::NDArray> > const&, bool) const+0x449) [0x7fdc07318f89]
[bt] (6) /.../python3.8/site-packages/dgl/libdgl.so(+0xc5c9ab) [0x7fdc073289ab]
[bt] (7) /.../python3.8/site-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fdc072b7f88]
[bt] (8) ...python3.8/site-packages/dgl/_ffi/_cy3/core.cpython-38-x86_64-linux-gnu.so(+0x16f99) [0x7fdbf8686f99]
Do you know why this could happen? I created the edge dictionary as follows:
# collector for edge ids for new subgraphs
edge_id_train_collector = dict()
edge_id_test_collector = dict()
# get edge id array for every edge type
for canonical_etype in heterograph.g.canonical_etypes:
# edge ids are specific to the edge type, all starting at 0
edge_ids = heterograph.g.edges(form='eid', etype=canonical_etype)
# shuffle and split edge id tensors
train_edges, test_edges = sklearn.model_selection.train_test_split(
edge_ids,
test_size=test_size,
train_size=train_size,
random_state=seed,
)
edge_id_train_collector[canonical_etype] = train_edges
edge_id_test_collector[canonical_etype] = test_edges
# create edge subgraph
train_subgraph = dgl.edge_subgraph(
graph=dgl_heterograph,
edges=edge_id_train_collector,
preserve_nodes=True,
)
Note: The documentation (version 6.0) says that the input for the edges
parameter should me a dictionary mapping edge types to nodes. Is this correct? Or should it rather be the edge ids as values?
If the graph is homogeneous, one can directly pass the above formats. Otherwise, the argument must be a dictionary with keys being edge types and values being the nodes.