I need to load a huge graph in pandas DF or numpy Array into existed empty DGLGraph instance (or add nodes and edges in batches), then I find that due to the Frame append operations in DGLGraph.add_nodes() like dgl/heterograph.py at 7b766393f8923f4a171fc1262aa5455d48996ace · dmlc/dgl · GitHub, there will be double copy of the nodes’ data/attribute (one is my loaded pandas DF, one is dgl node Frame), and this will waste GBs of system RAM.
My code like:
graph = dgl.DGLGraph()
graph.add_nodes(num, data = {“something”: torch.tensor(pd.DataFrame)})
I want to ask is there any method to avoid this double RAM problem to add new nodes and edges, like some other interface of dgl.DGLGraph or customized function or classes.