Since I am constructing a covariance matrix between all graphs in my dataset, I have to re-combine nodes in all graph-graph pairs, keep the node data and construct new in-between edges. For that I am merging two (or more graphs) with this function:
def merge_graphs(graphs, keep_edges=False, create_new_inbetween_edges=True):
"""
Function for merging two or more graphs
Arguments
---------
graphs : list of dgl.DGLGraph objects
The graphs, which will be merged into one graph.
Returns
-------
(dgl.DGLGraph)
A merged DGLGraph with same node data as original graphs
Author
------
Maximillian F. Vording
Inspiration
-----------
njchoma
url: https://discuss.dgl.ai/t/best-way-to-send-batched-graphs-to-gpu/171/6
"""
g_merged = dgl.DGLGraph(graph_data=dgl.batch(graphs))
# nodes
labels = graphs[0].node_attr_schemes()
for l in labels.keys():
g_merged.ndata[l] = torch.cat([g.ndata[l] for g in graphs], 0)
# edges
if keep_edges:
labels = graphs[0].edge_attr_schemes()
for l in labels.keys():
g_merged.edata[l] = torch.cat([g.edata[l] for g in graphs], 0)
else:
g_merged.remove_edges(list(range(g_merged.number_of_edges())))
if create_new_inbetween_edges:
graph_ids = tuple(range(len(graphs)))
combs = list(itertools.combinations(graph_ids, 2))
num_nodes = [g.number_of_nodes() for g in graphs]
new_node_inds = [
list(range(sum(num_nodes[:i]), sum(num_nodes[:i])+num_nodes[i]))
for i in range(len(num_nodes))
]
for i in range(len(combs)):
g_merged.add_edges(
*tuple(
zip(*itertools.product(
new_node_inds[combs[i][0]],
new_node_inds[combs[i][1]],
repeat=1)
)
)
)
return g_merged
I run into problems with dgl.batch() not preserving the reference to the original nodes and their tensors, so I have to construct the merged graphs every time the tensors on the original graphs are updated in each epoch. I also wanna make sure, that updates are consistent and shared for both my BatchedDGLGraph
and the original graphs in my dataset object, without having to set them explicitly as you suggest under BatchedDGLGraph/Update attributes, since this is removing the common reference to tensors, that the merged graphs have.
How can I make sure that the node data is referring back to tensors in the original graphs without having to explicitly set them for each update?
I considered using dgl.DGLSubGraph
instead, but since it does not support sharing of node/edge features for now, I’m not sure how to make that work either. When will sharing be supported?
I hope my question makes sense and if not I can elaborate with more code and explanations.
Thanks in advance