Hey,
I am currently trying to get started using DGL, by building a model that determines the maximum independent set of a graph. Recent (and older) papers do this by classifying each node of a graph with a score in [0,1], and then explore the solution space from this assignment.
However, I already struggle with the dataset preparation. The DGL documentation states how to create a dataset for node classification and graph classification. However, the node classification example assumes there only is a single graph, which is not true for MIS prediction. In MIS, we want to find one MIS per graph, i.e., label all vertices of a graph, but we have to train on a lot of graphs to learn this. I don’t understand how I should build a DGLDataset that contains multiple graphs, but labels per vertex on a graph, not per graph. Things I am not sure about include:
-
how is the index in
__getitem__
mapped to a vertex/graph? Do we return all labels for all vertices of graph idx? How should the Dataset class be structured in general? -
How does batching work in this case? Because we will have to calculate the loss using all vertex labels of a single graph, but then somehow deal with multiple losses.
Any help/pointers are very much appreciated. Thank you ver ymuch.
Best,
Maximilian