Train GCN with many graphs instead of one batch (for RL)

TheExGenesis · October 1, 2021, 4:29pm

I’m trying to train a GCN with 1 graph at a time. (I need it for RL, where observations come in a sequence)

When I train it with 1 batched graph, it works fine

Epoch 00195 | Time(s) 0.0674 | Loss 0.0003 | ETputs(KTEPS) 519.32
Epoch 00196 | Time(s) 0.0674 | Loss 0.0056 | ETputs(KTEPS) 519.42
Epoch 00197 | Time(s) 0.0674 | Loss 0.0024 | ETputs(KTEPS) 519.47
Epoch 00198 | Time(s) 0.0674 | Loss 0.0181 | ETputs(KTEPS) 519.51
Epoch 00199 | Time(s) 0.0674 | Loss 0.0019 | ETputs(KTEPS) 519.57

When I try to use one graph at a time, it doesn’t.

for g in gs:
                logits = model(g, features)
                loss = loss_fcn(logits[train_mask], labels[train_mask])
                losses.append(loss.item())
                optimizer.zero_grad()
                loss.backward()
                optimizer.step()

Output:

Epoch 00196 | Time(s) 0.1517 | Loss 9.9473 | ETputs(KTEPS) 4.62
Epoch 00197 | Time(s) 0.1516 | Loss 10.5100 | ETputs(KTEPS) 4.62
Epoch 00198 | Time(s) 0.1515 | Loss 10.6043 | ETputs(KTEPS) 4.62
Epoch 00199 | Time(s) 0.1514 | Loss 10.0472 | ETputs(KTEPS) 4.62

I don’t know what I’m doing wrong. It looks like it’s resetting the weights with each new graph but I’m not sure. I’m this has come up in the past, but couldn’t find solutions.

Runnable code:

gist.github.com

https://gist.github.com/TheExGenesis/6b847c790c2d9237162b48d1f5a9d7e8

gcn_many_graphs.py

#%%
"""GCN using DGL nn package
References:
- Semi-Supervised Classification with Graph Convolutional Networks
- Paper: https://arxiv.org/abs/1609.02907
- Code: https://github.com/tkipf/gcn
"""
from random import randint
import torch
import torch.nn as nn

This file has been truncated. show original

Rhett-Ying · October 2, 2021, 1:20am

If you’re training with 1 graph like for g in gs, you’re supposed to evaluate in the way as well, I think. Namely, evaluate on each graph one by one like you trained, not on the whole batched graph directly: Regression with GCN. Run with `--many-graphs` to train it one graph at a time, default is one batched graph, many epoch. · GitHub.

could you have a try on this?

TheExGenesis · October 2, 2021, 7:25pm

Since the task is regression, I’m just looking at the training loss, which is tracked per graph. Should’ve removed the evaluate line.

Also, if you run it and print logits and labels, you’ll notice logits are very close together, and nothing to do with the labels. (with --many-graphs)

TheExGenesis · October 4, 2021, 7:12pm

Any ideas? This seems like a pretty big deal

Rhett-Ying · October 8, 2021, 6:07am

sorry for the delay, as most of us are on vacation. I have no ideas on this. will ask someone else to take a look at this.

mufeili · October 8, 2021, 6:35am

If you try replacing

for g in gs:
    features = g.ndata["feat"]
    labels = g.ndata["label"]
    train_mask = g.ndata["train_mask"]
    val_mask = g.ndata["val_mask"]
    test_mask = g.ndata["test_mask"]
    in_feats = features.shape[1]
    # n_classes = data.num_labels
    n_classes = n_classes
    n_edges = g.num_edges()

    logits = model(g, features)
    loss = loss_fcn(logits[train_mask], labels[train_mask])
    losses.append(loss.item())
    optimizer.zero_grad()
    loss.backward()
    optimizer.step()

with

all_logits = []
all_labels = []
all_train_masks = []
for g in gs:
    features = g.ndata["feat"]
    labels = g.ndata["label"]
    train_mask = g.ndata["train_mask"]
    val_mask = g.ndata["val_mask"]
    test_mask = g.ndata["test_mask"]
    in_feats = features.shape[1]
    # n_classes = data.num_labels
    n_classes = n_classes
    n_edges = g.num_edges()

    logits = model(g, features)
    all_logits.append(logits)
    all_labels.append(labels)
    all_train_masks.append(train_mask)
    # loss = loss_fcn(logits[train_mask], labels[train_mask])
    # losses.append(loss.item())
    # optimizer.zero_grad()
    # loss.backward()
    # optimizer.step()
logits = torch.cat(all_logits)
labels = torch.cat(all_labels)
train_masks = torch.cat(all_train_masks)
loss = loss_fcn(logits[train_mask], labels[train_mask])
losses.append(loss.item())
optimizer.zero_grad()
loss.backward()
optimizer.step()

You will see a similar result to the batched case. That says, I doubt it’s simply because your data is randomly generated in a noisy way and a larger batch size just helps a lot for a fast convergence.

TheExGenesis · October 8, 2021, 9:49am

I was wondering:

Since labels are just the node’s own features “degree” (int) and “strat” (either 1 or 0) multiplied, then graph convolution would actually confuse this information. However, should it be impossible to represent this function? Bc that’s what I’m finding.

logits tensor([0.2616, 0.2639, 0.2456, 0.2272, 0.2615, 0.2333, 0.2315, 0.2304, 0.2486,
        0.2493], grad_fn=<SliceBackward>)
label tensor([0.6000, 0.0000, 0.7000, 0.0000, 0.0000, 0.0000, 0.5000, 0.0000, 0.0000,
        0.0000])
label mean: 0.3467000126838684
batched loss 0.11877474188804626
many gs loss 0.11885928802921626

If so, would Graph Attention be able to? Since it learns edge weights it could learn to only care about self-loops.

UPDATE: trying GAT with default parameters and seeing very similar results. Predicting around the label mean, no sensitivity to when the label is 0. (which should be easy since label==0 when strat==0). Edge weights don’t seem to settle on the self-loops even after dozens of epochs.

Edges weighted by attention, color is hotter based on node label (black=0, white=9)

Here’s the gist of my current code:

gist.github.com

https://gist.github.com/TheExGenesis/9af02ec2c1ed4b3fe6ec96d277f187b6

egt_gat.py

#%%
"""GCN using DGL nn package
References:
- Semi-Supervised Classification with Graph Convolutional Networks
- Paper: https://arxiv.org/abs/1609.02907
- Code: https://github.com/tkipf/gcn
"""
from random import randint, random
import torch
import torch.nn as nn

This file has been truncated. show original

UPDATE 2:

Since the regression outputs are between 0 and 1, I tried using GAT with L1 loss instead of L2 and it seems to actually learn something!

logits tensor([0.3955, 0.5527, 0.0365, 0.5493, 0.0103, 0.0182, 0.0364, 0.0364, 0.1838,
        0.0342], grad_fn=<SliceBackward>)
label tensor([0.8000, 0.9000, 0.0000, 0.9000, 0.0000, 0.0000, 0.0000, 0.0000, 0.0000,
        0.0000])
label mean: 0.3655099868774414
batched loss 0.017586950212717056
many gs loss 0.07839850544929504

It’s still not perfect, and I wonder why, since it’s such an easy task.

UPDATE 3

Confusingly, RMSE (root of the mean squared error) should do even better than L1 on small values, but it makes the results behave the same way as MSE.

GCN instead of GAT doesn’t work great .

mufeili · October 9, 2021, 6:33am

Have you tried increasing the number of randomly generated graphs?

system · November 8, 2021, 6:33am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.