How to Train and Validate a GCN on Several Graphs?

MilkshakeForReal · December 22, 2019, 4:53am

I can only give you some hints.

def run_a_train_epoch(epoch, model, data_loader,
                      loss_criterion, optimizer):
    model.train()
    for batch_id, batch_data in enumerate(data_loader):
        bg, labels = batch_data
        prediction =model(bg)
        loss = loss_criterion(prediction, labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()

def collate_graphs(data):
        graphs,  label = map(list, zip(*data))
        bg = dgl.batch(graphs)
        labels = th.stack(label, dim = 0)
        return bg, labels

train_loader = DataLoader(dataset=train_set,
                              batch_size=xx,
                              shuffle=True,
                              collate_fn=collate_graphs)

mufeili · December 22, 2019, 5:54pm

Take a look at this implementation. To use built-in modules, just replace the GCN class in the tutorial by this implementation.

ghaffarian · December 25, 2019, 1:58pm

So here is what I’ve implemented so far:

1) The GCN:

from dgl.nn.pytorch import GraphConv

class GCN(torch.nn.Module):

    def __init__(self, in_feats, n_hidden, n_classes, n_layers, activation, dropout):
        super(GCN, self).__init__()
        self.layers = torch.nn.ModuleList()
        # input layer
        self.layers.append(GraphConv(in_feats, n_hidden, activation=activation))
        # hidden layers
        for i in range(n_layers - 1):
            self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))
        # output layer
        self.layers.append(GraphConv(n_hidden, n_classes))
        self.dropout = torch.nn.Dropout(p=dropout)

    def forward(self, g):
        h = g.ndata['vec']
        for i, layer in enumerate(self.layers):
            if i != 0:
                h = self.dropout(h)
            h = layer(g, h)
        return h

Note: the tensor data for each node is stored in a attribute named 'vec' (graph.ndata['vec']).

2) Loading Data and Training GCN Model:

def collate(samples):
    graphs, labels = map(list, zip(*samples))
    batched_graph = dgl.batch(graphs)
    return batched_graph, torch.tensor(labels)

dataset = MyGraphDataset()
data_loader = DataLoader(dataset, batch_size=32, shuffle=True, collate_fn=collate)

hidden_units=16; hidden_layers=1; dropout=0.5
gcn_model = GCN(dataset.features_size, hidden_units, dataset.num_classes, hidden_layers, func.relu, dropout)
loss_func = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(gcn_model.parameters(), lr=0.01, weight_decay=5e-4)
gcn_model.train()

train_epochs = 20
for epoch in range(train_epochs):
    epoch_loss = 0
    gcn_model.train()
    for batch_id, (bg, label) in enumerate(data_loader):
        prediction = gcn_model(bg)
        loss = loss_func(prediction, label)  #  <---------   ERROR HERE!
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.detach().item()
    epoch_loss /= (batch_id + 1)
    print('Epoch {}, loss {:.4f}'.format(epoch, epoch_loss))

Note that the collate function is different from what you proposed and it is adopted from this DGL tutorial using GCN for classification.

But I’m getting below error at the designated line:

ValueError: Expected input batch_size (2921) to match target batch_size (32).

Note that the number 2921 is not the same on all runs, and it changes with almost every run. And here is the error trace-back:

Traceback (most recent call last):
  File "/home/seyed/Python-Projects/DGL-Test/test.py", line 181, in <module>
    loss = loss_func(prediction, label)
  File "/opt/anaconda3/envs/dgl-test/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/opt/anaconda3/envs/dgl-test/lib/python3.6/site-packages/torch/nn/modules/loss.py", line 916, in forward
    ignore_index=self.ignore_index, reduction=self.reduction)
  File "/opt/anaconda3/envs/dgl-test/lib/python3.6/site-packages/torch/nn/functional.py", line 2009, in cross_entropy
    return nll_loss(log_softmax(input, 1), target, weight, None, ignore_index, None, reduction)
  File "/opt/anaconda3/envs/dgl-test/lib/python3.6/site-packages/torch/nn/functional.py", line 1836, in nll_loss
    .format(input.size(0), target.size(0)))

Any clues?

VoVAllen · December 25, 2019, 4:18pm

The reason is prediction is the node representations for BatchedGraph. You need to use readout function such as sum_nodes to convert it into graph-level representaion.

Change the lines below to replace loss = loss_func(prediction, label)

bg.nodes['hv'] = prediction
graph_repr = dgl.sum_nodes(bg, 'hv')
loss = loss_func(graph_repr, label)

ghaffarian · December 26, 2019, 10:47am

I had to change your first line to bg.ndata['hv'] = prediction because using bg.nodes['hv'] I got the following error:

TypeError: 'NodeView' object does not support item assignment

Now the program executes without errors, but I’m not getting good results! The epoch-loss is not descending enough! Here is a sample output with 128 epochs:

epoch #001, loss = 1.6121
epoch #002, loss = 1.3028
epoch #003, loss = 0.7786
epoch #004, loss = 0.6937
epoch #005, loss = 0.6930
epoch #006, loss = 0.6553
epoch #007, loss = 0.6367
...
epoch #120, loss = 0.5716
epoch #121, loss = 0.6556
epoch #122, loss = 0.6175
epoch #123, loss = 0.5681
epoch #124, loss = 0.5405
epoch #125, loss = 0.5682
epoch #126, loss = 0.5562
epoch #127, loss = 0.6039
epoch #128, loss = 0.5859

Here is my final training code:

hidden_units = 16; hidden_layers = 1; dropout = 0.5
gcn_model = GCN(data.features_size, hidden_units, data.num_classes, hidden_layers, func.relu, dropout)
loss_func = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(gcn_model.parameters(), lr=0.01, weight_decay=5e-4)
gcn_model.train()

train_epochs = 128
for epoch in range(train_epochs):
    epoch_loss = 0
    gcn_model.train()
    for batch_id, (batched_graphs, batched_labels) in enumerate(data_loader):
        prediction = gcn_model(batched_graphs)
        batched_graphs.ndata['hv'] = prediction
        graph_repr = dgl.sum_nodes(batched_graphs, 'hv')
        loss = loss_func(graph_repr, batched_labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.detach().item()
    epoch_loss /= (batch_id + 1)
    print('epoch #%03d, loss = %.4f' % (epoch + 1, epoch_loss))

Is my training code correct? How about the model params?
If the code & params are OK, then maybe GCN is not suitable for my data?

VoVAllen · December 26, 2019, 11:30am

Hi,

Loss may not be a good metrics, how about your accuracy? Maybe you could try GAT or other readout functions.

ghaffarian · December 26, 2019, 11:52am

1) Validation
I agree that loss value may not be a good metric. When using scikit-learn, performing a 5-fold cross-validation or calculating metrics for model’s output on test data is very easy:

# 5-fold cross-validation
from sklearn.model_selection import cross_val_score
cv_scores = cross_val_score(model, features, labels, cv=5)

# Manual train/test split & validation
from sklearn.metrics import precision_recall_fscore_support
train_features, test_features, train_labels, test_labels = train_test_split(features, labels, test_size=0.2)

model = SomeClassificationModel(...)
model.fit(train_features, train_labels)
model_output = model.predict(test_features)
metrics = precision_recall_fscore_support(test_labels, model_output)

I didn’t find any easy way to do such train/test split and validation using DGL.
I did saw some code using a train_mask and test_mask, but I really didn’t understand what was going on!
Can you provide some help based on my code?

2) Alternative Models
I am definitely going to test some other more powerful models (like GAT, SAGE, …), but I need the best possible results for GCN, as a base-line for GNN models.

aah71 · December 27, 2019, 12:54pm

@ghaffarian where is the message passing process taking place in your implementation ? In the tutorial they use the forward function:

def forward(self, g, feature):
    # Initialize the node features with h.
    g.ndata['h'] = feature
    g.update_all(msg, reduce)
    g.apply_nodes(func=self.apply_mod)
    return g.ndata.pop('h')

where each node’s features are updated by averaging the neighboring nodes’ features.

In the issue i raised (Does the current Batched Graph Classification source code support edge features?) i mentioned that the problem i am encountering using the tutorial for graph classification is that I am obtaining the same class probabilities for all elements of the batch (example : 0.2-0.3 -0.25-0.25 for all graphs of the batch size).

When I tried your method this wasn’t the case. However, when testing, it seemed that you are making a prediction for each node invidually (I got the prediction size = input size * number of nodes in each graph), but it could also be something i did wrong.

Anyway, if you succeed with your model please do share it, I have the same challenge for graph classification

ghaffarian · December 27, 2019, 1:10pm

@aah71 You can see the implementation by referring to post #6 in this topic. The forward method is implemented as you can see; and the rest of the message passing process is handled by using the built-in GraphConv module which contains the implementation for a single-layer of a GCN.

Of course I’m not very sure about the correctness of my implementation! I’m still learning about GNN concepts and DGL. My current challenge is to implement the validation process and see how good / bad are the results.

Please let me know if you find any issues in the implementation.

mufeili · December 27, 2019, 4:46pm

train_mask, valid_mask and test_mask are binary masks. For example, train_mask[i] == 1 indicates that i is a training node. The reason we use these masks is that datasets like Cora, Citeseer and Pubmed consist of one graph only. For node classification, we perform message passing over the whole graph to update representations of all nodes and these masks are used for loss computation/model evaluation on the nodes in the training/validation and test sets.
For graph-level prediction, it’s common that we need to perform a random split over a list of graphs. You may find this function to be helpful: https://github.com/dmlc/dgl/blob/master/python/dgl/data/utils.py#L45

ghaffarian · December 28, 2019, 3:54pm

@mufeili @VoVAllen
I’m having trouble interpreting the model output for evaluation …

# define model
features_size=256; hidden_units=16; num_classes=2; hidden_layers=1; dropout=0.5
model = GCN(features_size, hidden_units, num_classes, hidden_layers, func.relu, dropout)
# set training mode
model.train()
'... rest of training code in post #8 of this topic ...'
# set evaluation mode
model.eval()  
# call the model on a single test graph
out = model(g)  
# checkout the output
print(type(out))
print(out.shape)
print(out)

This gave the following result:

<class 'torch.Tensor'>
torch.Size([15, 2])
tensor([[ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 2.5688e-02, -2.5688e-02],
        [ 1.4805e-02, -2.0951e-02],
        [ 1.9604e-02, -2.3040e-02],
        [ 2.8611e-04, -2.2678e-02],
        [-5.7971e-02, -2.6015e-02],
        [ 2.2193e+00, -1.7411e+00]], grad_fn=<AddBackward0>)

As mentioned before, my problem is a binary classification; so I expect to have either a single value as output (0 or 1), or two probability values which indicates the model’s confidence about assigning the input to each of the two classes.

But as you can see I have an output with shape (15 x 2)!!
How should I interpret this to my desired output label (0 or 1)?

mufeili · December 28, 2019, 7:16pm

To get probabilities, we typically apply a softmax to the output of the model for all classes, often referred as logits. My guess is that these are logits for class 0 and 1.

The loss computation is typically performed at the level of logits for the consideration of numerical stability.

ghaffarian · January 2, 2020, 8:56am

@VoVAllen @mufeili I’m having trouble understanding how to use softmax to achieve probabilities for my binary classification problem.

Here is the code I have:

output = gcn_model(single_test_graphs)
print(type(output), 
      output.shape, 
      output)

This is the output:

<class 'torch.Tensor'> 
torch.Size([18, 2]) 
tensor([[ 0.0212, -0.0212],
        [ 0.0545, -0.0615],
        [ 0.0212, -0.0212],
        [ 0.0545, -0.0615],
        [ 0.0212, -0.0212],
        [ 0.0212, -0.0212],
        [ 0.0212, -0.0212],
        [ 0.0153, -0.0150],
        [ 0.0212, -0.0212],
        [ 0.0212, -0.0212],
        [ 0.0212, -0.0212],
        [ 0.0212, -0.0212],
        [-0.1575,  0.0535],
        [ 0.0212, -0.0212],
        [ 0.0545, -0.0615],
        [ 0.1218, -0.2458],
        [-0.5680,  0.3171],
        [ 0.0212, -0.0212]], grad_fn=<AddBackward0>)

As you can see, the model output is of shape 18x2 (the first dimension is not fixed and changes in different executions). I need to have only 2 probabilities for the 2 classes.

So I tried to use softmax on the model output like this:

softmax_output = torch.softmax(output, dim=1)
print(type(softmax_output),
      softmax_output.shape,
      softmax_output)

And here is the output:

<class 'torch.Tensor'> 
torch.Size([18, 2]) 
tensor([[0.5078, 0.4922],
        [0.5535, 0.4465],
        [0.5078, 0.4922],
        [0.5535, 0.4465],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.4781, 0.5219],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.5078, 0.4922],
        [0.5535, 0.4465],
        [0.5078, 0.4922],
        [0.3002, 0.6998],
        [0.5078, 0.4922]], grad_fn=<SoftmaxBackward>)

The output is still a tensor with shape 18x2, although the values have changed and the sum of each pair equals to 1, due to the effect of applying softmax.

But I still don’t know how to interpret it to my desired output!
I need a single pair of probabilities for each class label.

mufeili · January 2, 2020, 1:37pm

What do you mean by “interpret it to my desired output”? What’s your objective function? The first column yields the probabilities for class 0 and the second column yields the probabilities for class 1. You can also consider them as the probabilities for not being class 1 and the probabilities for not being class 0.

ghaffarian · January 2, 2020, 1:54pm

@mufeili When I only have 2 class labels, I need only 2 numbers from my model:

Probability/Confidence to classify as class #0
Probability/Confidence to classify as class #1

You said: “The first column yields the probabilities for class 0 and the second column yields the probabilities for class 1”.
How do you get one probability value for class #0 or class #1 when each column contains 18 values?
Can you please provide a code snippet based on mine, to show how do you do this?

mufeili · January 3, 2020, 6:02am

I presume this just means you have “18” graphs in this batch and each row represents the probabilities for each graph?

ghaffarian · January 4, 2020, 7:06am

@mufeili That’s the point … No, the input graph is not a batched graph; it is only a single DGL-Graph with a single label value (0 / 1)!

I even made it explicit in the code I provided in post #16 to prevent any misunderstandings!

So why am I getting multiple pairs of outputs when feeding a single graph to the network? Have I made a mistake in the GCN implementation?

Here is my GCN code:

class GCN(torch.nn.Module):

    def __init__(self, in_feats, n_hidden, n_classes, n_layers, activation, dropout):
        super(GCN, self).__init__()
        self.layers = torch.nn.ModuleList()
        # input layer
        self.layers.append(GraphConv(in_feats, n_hidden, activation=activation))
        # hidden layers
        for i in range(n_layers - 1):
            self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))
        # output layer
        self.layers.append(GraphConv(n_hidden, n_classes))
        self.dropout = torch.nn.Dropout(p=dropout)

    def forward(self, g):
        h = g.ndata['vec']
        for i, layer in enumerate(self.layers):
            if i != 0:
                h = self.dropout(h)
            h = layer(g, h)
        return h

And here is the training code (batching is only performed on train-data):

# Load data
data = MyDataset(...)
train_data, test_data = data.split_train_test(test_ratio=0.2)
data_loader = DataLoader(train_data, batch_size=32, shuffle=True, collate_fn=collate)

# Create model
gcn_model = GCN(data.features_size, hidden_units, data.num_classes, hidden_layers, func.relu, dropout)
gcn_model.train()  # set the model in training mode

# Training
loss_func = torch.nn.CrossEntropyLoss()
optimizer = torch.optim.Adam(gcn_model.parameters(), lr=learn_rate, weight_decay=weight_decay)
for epoch in range(epochs):
    epoch_loss = 0
    for batch_id, (batched_graphs, batched_labels) in enumerate(data_loader):
        prediction = gcn_model(batched_graphs)
        batched_graphs.ndata['bp'] = prediction
        graph_repr = dgl.sum_nodes(batched_graphs, 'bp')
        loss = loss_func(graph_repr, batched_labels)
        optimizer.zero_grad()
        loss.backward()
        optimizer.step()
        epoch_loss += loss.detach().item()
    epoch_loss /= (batch_id + 1)
    print('epoch #%02d, loss = %.4f' % (epoch, epoch_loss))

And finally the test code (no batching on test-data):

# test-data is a list of pairs: (DGL-graph, label)
gcn_model.eval()  # set the model in evaluation mode
for graph, label in test_data:
    output = gcn_model(graph)
    softmax_output = torch.softmax(output, dim=1)
    print(type(softmax_output),
          softmax_output.shape,
          softmax_output)

mufeili · January 5, 2020, 2:18pm

@ghaffarian When using GNN-based approaches for graph-level prediction, we first update node representations, then compute graph-level representations out of node-level representations and finally make a prediction. So in your case I guess you did not compute the graph-level representation out of node-level representations. For a most naive case, you can simply take a sum over the node representations as @VoVAllen suggested here. Also see the corresponding section in the tutorial.

ghaffarian · January 9, 2020, 6:56am

Thanks a lot @mufeili.
It seems I had misunderstood how GCNs work; but now I get it.
I solved it using your suggestion to apply a dgl.sum_nodes and then applying torch.softmax on the output.
If you know about any other better method than a sum I’ll be glad to know about it.
Even a reference would be great to deepen my understanding of the topic.

mufeili · January 10, 2020, 9:10am

See https://github.com/dmlc/dgl/blob/master/python/dgl/nn/pytorch/glob.py