About somethings like `pyg.data.separate.separate`

e-yi · March 26, 2024, 1:55pm

Currently, there are basically two methods to store numerous small graphs in a dataset with DGL. The first method is to store the graph structure and constructing a DGLGraph only in the dataset.__get_item__ method (e.g. QM9Dataset). The second method stores the graphs as a list within a member variable of the Dataset (e.g. ZINCDataset). The first method can be cumbersome, while the second may encounter memory issues. With something like separate, it will be possible to store a list of graphs as a batched graph and retrieve any of these graphs from the batched graph in a dataset without needing to unbatch it.

github.com

pyg-team/pytorch_geometric/blob/c75c7192f8048f90c8150d071a86a051d5cbb4f5/torch_geometric/data/in_memory_dataset.py#L111

      
        
            def get(self, idx: int) -> BaseData:
                # TODO (matthias) Avoid unnecessary copy here.
                if self.len() == 1:
                    return copy.copy(self._data)
            
            
    if not hasattr(self, '_data_list') or self._data_list is None:
                    self._data_list = self.len() * [None]
                elif self._data_list[idx] is not None:
                    return copy.copy(self._data_list[idx])
            
            
    data = separate(
                    cls=self._data.__class__,
                    batch=self._data,
                    idx=idx,
                    slice_dict=self.slices,
                    decrement=False,
                )
            
            
    self._data_list[idx] = copy.copy(data)
            
            
    return data

BarclayII · April 10, 2024, 12:53pm

Does dgl.slice_batch work for you?

e-yi · April 17, 2024, 3:21pm

Thanks, it may work. By the way, may I ask what prevents DGL from supporting the saving and loading of batched graphs? It would simply be much faster.

Rhett-Ying · April 18, 2024, 2:26am

Batched graph is a list of DGLGraphs and you may save them with dgl.save_graphs() together with additional info?

e-yi · April 18, 2024, 2:38am

I mean saving batched_graphs=dgl.batch(graphs) and preserving the information for unbatching after loading. See Any example for save/load single graph and batched graph? · Issue #936 · dmlc/dgl · GitHub.

Rhett-Ying · April 18, 2024, 2:53am

No such native support for batched graph for now. If you really want it, please file a feature request in DGL repo.

system · May 18, 2024, 2:54am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.