Node Classification over Multiple Graphs

Hey,

I am currently trying to get started using DGL, by building a model that determines the maximum independent set of a graph. Recent (and older) papers do this by classifying each node of a graph with a score in [0,1], and then explore the solution space from this assignment.

However, I already struggle with the dataset preparation. The DGL documentation states how to create a dataset for node classification and graph classification. However, the node classification example assumes there only is a single graph, which is not true for MIS prediction. In MIS, we want to find one MIS per graph, i.e., label all vertices of a graph, but we have to train on a lot of graphs to learn this. I don’t understand how I should build a DGLDataset that contains multiple graphs, but labels per vertex on a graph, not per graph. Things I am not sure about include:

  1. how is the index in __getitem__ mapped to a vertex/graph? Do we return all labels for all vertices of graph idx? How should the Dataset class be structured in general?

  2. How does batching work in this case? Because we will have to calculate the loss using all vertex labels of a single graph, but then somehow deal with multiple losses.

Any help/pointers are very much appreciated. Thank you ver ymuch.

Best,
Maximilian

For node classification, let’s say you have two graphs, one with 3 nodes and one with 5 nodes. If __getitem__ receives a value i in \{0, 1, 2\}, it should return the label of the corresponding node in the first graph. If it receives a value i in \{3, 4, 5, 6, 7\}, it should return the label of node i - 3 in the second graph. The high level idea is that we can index the nodes in all graphs as if we batch all the graphs. When __getitem__ receives a node ID, it will first map the ID to the graph ID and then the node ID in the corresponding graph. You can perform loss computation in a similar fashion.

Thank you very much for your reply!

But does this mean that already within the DGLDataset implementation, we batch our several graphs into one graph that consists of multiple independent components? Because what I got from the documentation, e.g. in snippets like this:

from dgl.dataloading import GraphDataLoader
dataloader = GraphDataLoader(
    dataset,
    batch_size=1024,
    drop_last=False,
    shuffle=True)

was that the dataset should not define batches themselves, because the GraphDataLoader is responsible for batching multiple graphs.

The issue I see currently is that within our training loop, we need to have something along these lines:

for epoch in epochs:
    for graph in graphs:
       train()

And if I just have an incremental mapping of node to graph, we have an issue, as the training loop doesn’t know whether a node ID maps to a new graph or not. So maybe the problem wasn’t too clear: For the MWIS, we classify each node of a graph, but as a group, i.e., each node is either path of the set of vertices we are looking for or it is not. Hence, the loss is calculated per graph, but we need to classify each node separately, in order to derive the loss. It’s not something we do node-by-node.

This is why I thought __getitem__(i) should return a list of all labels for graph i, instead of “flattening” this labelmapping. But this is not what you suggest.

Thank you very much again for your help. I hope I made my problem more clear now.

I think GraphDataLoader is more about graph property prediction with multiple graphs. Another possibility is to have one dataset object per graph. During training, you can first shuffle the dataset objects and then iterate over the dataset objects as you did in the for loop.

1 Like

Hey,

I implemented the model as described in the original paper [1] that introduced the idea of calculating the MIS using a GCN. As you suggested, I have one dataset object per graph, and iterate over the numerous dataset objects. The training loop can be seen in [2], and the basic GCN as in the DGL documentation (+ hindsight loss) in [3].

However, the training is very slow, and as a newbie to DGL, I am not too sure on how to debug this. In our sever, we have a Tesla T4. The authors of [1] train on a Titan GPU, and claim to train 200 epochs of 38000 graphs in 16 hours. They implemented their model on top on the original GCN implementation in Tensorflow 1. For me, a single epoch in DGL takes around 2 hours (with 2 instead of 32 outputs, so less computation!) (cuda is enabled and works).

Could this be an issue with my training loop? I don’t think that the original GCN implementation is much more efficient than DGL. Any idea on how to approach an issue like this is very much appreciated, because it’s like a blackbox that for some reason is slow to me, currently.

Thank you so much for your help.

Best,
Maximilian

[1] [1810.10659] Combinatorial Optimization with Graph Convolutional Networks and Guided Tree Search
[2] import argparseimport timeimport numpy as npimport networkx as nximport - Pastebin.com
[3] import torchimport torch.nn as nnfrom dgl.nn.pytorch import GraphConvimpor - Pastebin.com

Hi Maximilian,

  1. How dense are the graphs?
  2. Did the authors release the code? Do you know if they train over one graph a time? My guess is that you probably still need to batch multiple graphs for each iteration. One possibility is that for each iteration you batch a list of randomly sampled graphs and keep the original membership of the nodes for later loss computation over graphs.

Hey,

thank you for your answer!

  1. Is there a density measure you want to know? Other than that, the graphs are converted SAT instances (i.e. generated by reducing a SAT instance to a MIS problem on the graph), and have ~1200 vertices each (because we have 400 clauses with 3 literals each), where the three vertices that represent a clause are all connected, and the vertices of the inverted literals are connected as well. So I’d say it’s rather dense. However, it’s exactly the same training data than for the original paper, so it (in theory) shouldn’t be the root cause of the issue.
  1. Yes, the authors released the code: NPHard/train.py at master · intel-isl/NPHard · GitHub While I think it’s kinda hard to read, in the paper, they state they “use Adam with single-graph mini-batches […}. Training proceeds for 200 epochs and takes about 16 hours on a desktop…”. I interpret single-graph mini batches that they did it the same way like I did it in my code. So I could try to batch, but a) I’m not sure whether it would work, as I believe the SAT training graphs could in theory also contain multiple unconnected components, such that the network couldn’t understand what the difference between another graph and a connected component is (if necessary, I’m not sure about that one) and b) I still think maybe there is an issue in my implementation (even though it’s very simple) because I just don’t believe that DGL in the same setting would be that much slower than the original 2017 Tensorflow1 GCN code the author’s repository uses…

But already thank you so much for your help.

Best,
Maximilian

  1. How many edges are there in a graph of 1200 vertices?
  2. Have you tried running the authors’ code? Did you see the same speed? The hardware you have might also be different from the one the authors have.

Hi. I have a question related to this. Is it possible to train a GNN for link prediction on multiple graphs instead of just one? Thank you.

Absolutely. You can batch multiple graphs, and then training a link prediction model on multiple graphs will be the same as training a link prediction model on a single graph.

1 Like