We’ve implemented a NodeDataLoader per the DGL example, using a small Parquet file instead of CSV.
In reality, we have about 280 Parquet files, with a total of over 280 million nodes. These nodes already have node ids that are unique to their data source (Neo4j). The NodeDataLoader appears to require an input graph, where the source/destination node ids are indexed starting w/ zero. There is no way for us to ‘renumber’ our node ids w/out reading in our entire 280 million-node Neo4j data and creating some Neo4j/DGL node remapping.
Is there a way for us to incrementally load data to NodeDataLoader, using our own DGLDataset, that doesn’t require us to renumber all nodes to be 0-based, as DGL appears to require?