Creating heterogeneous graphs with non-complete canonical types

Suppose I have the following minimal example:

A bipartite graph where the node w does not have any relationships (of this type)

import networkx as nx 
subgraph_1 = nx.DiGraph()
subgraph_1.add_nodes_from(['u', 'v', 'w'], bipartite=0)
subgraph_1.add_nodes_from(['a', 'b'], bipartite=1)
subgraph_1.add_edges_from([('u', 'a'), ('u', 'b'), ('v', 'b')])

And another bipartite graph where the nodes u,v,w all have relationships to
at least one member in the other group.

subgraph_2 = nx.DiGraph()
subgraph_2.add_nodes_from(['u', 'v', 'w'], bipartite=0)
subgraph_2.add_nodes_from(['c', 'd'], bipartite=1)
subgraph_2.add_edges_from([('u', 'a'), ('v', 'b'), ('w', 'd')])

How can I construct a heterograph for this in DGL? Supposing I try to do

g = dgl.heterograph({
      ('type1', 'partial', 'type2'): subgraph_1,
      ('type 2', 'complete', 'type3'): subgraph_2
})

I will get an error:

/Users/xiangsx/work/dgl/dgl/src/graph/heterograph.cc:139: Check failed: num_verts_per_type_[srctype] == nv (2 vs. 3) : Mismatch number of vertices for vertex type 1
Stack trace:
  [bt] (0) 1   libdgl.dylib                        0x00000001301e0309 dmlc::LogMessageFatal::~LogMessageFatal() + 57
  [bt] (1) 2   libdgl.dylib                        0x00000001309f2746 dgl::HeteroGraph::HeteroGraph(std::__1::shared_ptr<dgl::GraphInterface>, std::__1::vector<std::__1::shared_ptr<dgl::BaseHeteroGraph>, std::__1::allocator<std::__1::shared_ptr<dgl::BaseHeteroGraph> > > const&) + 1622
  [bt] (2) 3   libdgl.dylib                        0x00000001309f6e9f dgl::CreateHeteroGraph(std::__1::shared_ptr<dgl::GraphInterface>, std::__1::vector<std::__1::shared_ptr<dgl::BaseHeteroGraph>, std::__1::allocator<std::__1::shared_ptr<dgl::BaseHeteroGraph> > > const&) + 79
  [bt] (3) 4   libdgl.dylib                        0x00000001309fb5ca std::__1::__function::__func<dgl::$_3, std::__1::allocator<dgl::$_3>, void (dgl::runtime::DGLArgs, dgl::runtime::DGLRetValue*)>::operator()(dgl::runtime::DGLArgs&&, dgl::runtime::DGLRetValue*&&) + 618
  [bt] (4) 5   libdgl.dylib                        0x0000000130997de6 DGLFuncCall + 70
  [bt] (5) 6   core.cpython-37m-darwin.so          0x0000000130fae69c __pyx_f_3dgl_4_ffi_4_cy3_4core_FuncCall(void*, _object*, DGLValue*, int*) + 460
  [bt] (6) 7   core.cpython-37m-darwin.so          0x0000000130fb2c27 __pyx_pw_3dgl_4_ffi_4_cy3_4core_12FunctionBase_5__call__(_object*, _object*, _object*) + 55
  [bt] (7) 8   Python                              0x000000010fe90918 _PyObject_FastCallKeywords + 358
  [bt] (8) 9   Python                              0x000000010ff25ef0 call_function + 746

Resulting from the fact that bipartite when called on the first graph will not add the node ‘w’ as it adds nodes based on the existing edges. Similarly, hetero_from_relations will not merge these off-node-count subgraphs either. add_nodes is also not implemented in the heterogeneous graph so there is no way for me to clearly create the subgraph - call bipartite and then add missing nodes.

Is there a work around for creating a graph like this? I could work on something that allows this - but we would need some discussion as to how this should be implemented. Or is it the case that heterogeneous graph (as it does now) will require the canonical edges to exist for all source and sink nodes of a given type (this does not seem desirable - as this will not be the case beyond toy or benchmarking examples). It would be really nice to convert our existing DGL code to these fairly new heterographs as it would allow for some more broad architectural choices.

How about

import dgl
import networkx as nx

subgraph_1 = nx.DiGraph()
subgraph_1.add_nodes_from(['u', 'v', 'w'], bipartite=0)
subgraph_1.add_nodes_from(['a', 'b'], bipartite=1)
subgraph_1.add_edges_from([('u', 'a'), ('u', 'b'), ('v', 'b')])
subgraph_1 = dgl.bipartite(subgraph_1, utype='v1', etype='e1', vtype='v2')

subgraph_2 = nx.DiGraph()
subgraph_2.add_nodes_from(['u', 'v', 'w'], bipartite=0)
subgraph_2.add_nodes_from(['c', 'd'], bipartite=1)
subgraph_2.add_edges_from([('u', 'c'), ('v', 'd'), ('w', 'd')])
subgraph_2 = dgl.bipartite(subgraph_2, utype='v3', etype='e2', vtype='v4')

g = dgl.hetero_from_relations([subgraph_1, subgraph_2])

You can first convert networkx bipartite graphs to dgl bipartite graphs and then use hetero_from_relations.

Thanks for reporting. Bug confirmed. Raised issue in https://github.com/dmlc/dgl/issues/967

I did intend to mention that u, v, and w were of the same type in both graphs.

import dgl
import networkx as nx

subgraph_1 = nx.DiGraph()
subgraph_1.add_nodes_from(['u', 'v', 'w'], bipartite=0)
subgraph_1.add_nodes_from(['a', 'b'], bipartite=1)
subgraph_1.add_edges_from([('u', 'a'), ('u', 'b'), ('v', 'b')])

subgraph_2 = nx.DiGraph()
subgraph_2.add_nodes_from(['u', 'v', 'w'], bipartite=0)
subgraph_2.add_nodes_from(['c', 'd'], bipartite=1)
subgraph_2.add_edges_from([('u', 'c'), ('v', 'd'), ('w', 'd')])

g = dgl.heterograph({
      ('type1', 'partial', 'type2'): subgraph_1,
      ('type1', 'complete', 'type3'): subgraph_2
})

The code above should work. There are two problems with your original code:

subgraph_2.add_edges_from([('u', 'a'), ('v', 'b'), ('w', 'd')])

should be

subgraph_2.add_edges_from([('u', 'c'), ('v', 'd'), ('w', 'd')])

and

g = dgl.heterograph({
      ('type1', 'partial', 'type2'): subgraph_1,
      ('type 2', 'complete', 'type3'): subgraph_2
})

should be

g = dgl.heterograph({
      ('type1', 'partial', 'type2'): subgraph_1,
      ('type1', 'complete', 'type3'): subgraph_2
})