Merging multiple graphs into a single graph

acho · February 28, 2020, 2:57pm

Hi all,

Is there an easy/efficient way to merge multiple graphs into a single graph? I considered using BatchedDGLGraph, but I would need to add some new edges between the nodes of the different sub-graphs afterwards and BatchedDGLGraph is read-only.

Thank you

zihao · February 28, 2020, 3:17pm

Hi @acho,
I suggest you using dgl.batch but you mentioned you need to add new edges. My question is how would you like to initialize edge features of new edges.

I’m refactoring the code to merge BatchedDGLGraph with DGLGraph and ideally you would be able to add new edges for batched graphs, I’m interested in how you dealing with these new features.

acho · February 28, 2020, 3:36pm

Hi. I assumed that they could be initialised the same way they are initialised in a regular DGLGraph. I think I am probably not understanding the problem.

zihao · March 2, 2020, 7:20am

I mean when edges in your DGLGraph already have attributes, newly added edges need to initialize their attributes (by default we initialize them as all zero). Ignore that if that’s not important in your case.

We will provide a flatten interface for batched graph, means regarding the batched graph as a single graph, and you can mutate on this new graph.

acho · March 2, 2020, 11:02am

Yes, copying already existing edge attributes and initialising new edge attributes as zero seems pretty reasonable.

Thank you.

zihao · March 9, 2020, 8:16am

@acho, we have merged DGLGraph and BatchedDGLGraph and provided the flatten api in the master branch. You can try this new feature by install from the source code in the master branch or pip install the nightly build version by:

Now you can add new nodes/edges on the batched graph:

>>> import dgl
>>> import torch
>>> g = dgl.DGLGraph()
>>> g.add_nodes(3)
>>> g.add_edges([0,1,2],[1,2,0])
>>> g.ndata['h'] = torch.ones(3, 5)
>>> g1 = dgl.DGLGraph()
>>> g1.add_nodes(4)
>>> g1.add_edges([0,1,2,3],[0,1,2,3])
>>> g1.ndata['h'] = torch.ones(4, 5) * 2
>>> large_g = dgl.batch([g, g1])
>>> large_g.ndata
{'h': tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.]])}
>>> large_g.batch_size
2
>>> large_g.batch_num_nodes
[3, 4]
>>> large_g.batch_num_edges
[3, 4]

Note that you can add/remove nodes/edges on large_g directly, but you will receive a warning

>>> large_g.add_nodes(5)
/Users/###/dgl/python/dgl/base.py:25: UserWarning: The graph has batch_size > 1, and mutation would break batching related properties, call `flatten` to remove batching information of the graph.
  warnings.warn(msg, warn_type)

To depress the warning, you can call large_g.flatten to make it a single graph:

>>> large_g.flatten()
>>> large_g.batch_size
1
>>> large_g.batch_num_nodes
[7]
>>> large_g.batch_num_edges
[7]
>>> large_g.add_nodes(5)
>>> large_g.ndata
{'h': tensor([[1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [1., 1., 1., 1., 1.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [2., 2., 2., 2., 2.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.],
        [0., 0., 0., 0., 0.]])}

I hope this could satisfy your needs.

acho · March 11, 2020, 2:39pm

@zihao, thank you very much for your work and for the detailed explanation! It definitely satisfies my needs and probably others.

jun521ju · July 14, 2020, 6:08am

Hi zihao,

When merging/batching two graphs, is there a way to merge two common nodes as one node?
Seems dgl.batch() does not connect two graphs into one based on common nodes.
How can I merge two common nodes (representing ‘NY’) into one?

Each node represents a city, since g1 and g2 share the same city NY, I’d like to have the batched bg has only one node representing NY.

Thanks,
Zhiju

zihao · July 15, 2020, 3:15am

Hi @jun521ju, currently we do not support such functionality, after dgl.batch the graphs in the list are not connected.

In your case, if you assign a unique index for each node (e.g.: BJ: 1, SJ: 2, NY: 3 LA: 4).
You can create an edge list

g1: (4, 3)
g2: (1, 2), (2, 3)

What you need is to merge the two edge lists.

Just create a new graph given the edge list:

>>> import dgl
>>> g = dgl.graph(([4, 1, 2], [3, 2, 3]))