I saw there is tutorial on stochastic training on large graphs on here and there are also GNN examples implemented on TF2 here. Is it possible to run the stochastic training / minibatching on TF2? Thanks
It should be possible, but unfortunately we don’t have TF2 counterparts for
EdgeDataLoader because we are not quite familiar with how custom minibatching works in TF2. If you could give us a reference of how TF2 custom minibatching works then we can figure out how to do minibatch GNN training together.
On TF2 we can use a class called tf.data.Dataset to wrap our data and yield numpy iterator using a method called
as_numpy_iterator (the examples are on the same page in the documentation).
Also, regarding training with large graphs, can we construct the graph object in batches? My dataset contains multiple graphs and saved in TFRecord format. I will read it using above
tf.data.Dataset class and yield NumPy iterator. The format of the TFRecords is that each row represents each graph. Each batch of iterator represents 1 row, or in other words, 1 graph. Since I guess we need to construct 1 DGLGraph object consisting of all graphs for train and another 1 DGLGraph object for test (with inductive learning setting), I guess I need to build the DGLGraph object in batches feeding in from the iterator. Does the problem explanation clear enough? Thanks a lot!
dgl.batch — DGL 0.8 documentation could be used to batch graphs in DGL.
One question I have though is does
as_numpy_iterator support yielding non-array elements? Because DGLGraphs are not numpy arrays or tensors; it’s more complicated than that.
What do you mean by non-array elements?