Suggestion to improve the model performance

Hey all,

I have defined my GCN model as same as in the example as below and I want to do binary node classification (only 2 labels 0 and 1).

class GCN(nn.Module):
  def __init__(self, in_feats, n_hidden, n_classes, n_layers, activation, dropout):
    super(GCN, self).__init__()
    self.layers = nn.ModuleList()
    # input layer
    self.layers.append(GraphConv(in_feats, n_hidden, activation=activation))
    # hidden layers
    for i in range(n_layers - 1):
        self.layers.append(GraphConv(n_hidden, n_hidden, activation=activation))
    # output layer
    self.layers.append(GraphConv(n_hidden, n_classes))
    self.dropout = nn.Dropout(p=dropout)

  def forward(self, g_dgl, features):
    h = features
    for i, layer in enumerate(self.layers):
        if i != 0:
            h = self.dropout(h)
        h = layer(g_dgl, h)
    return h

When creating the model, I used configuration as followed:
dropout = 0.5; n_hidden = 3; n_layers = 1; learning_rate = 0.01; weight_decay = 0.0005
In addition, I used CrossEntropyLoss() as for my loss function and Adam for the optimizer.

Due to the nature of the problem, I have imbalance nodes in my trained graph, where 59139 nodes are labelled 0 and 20615 nodes are labelled 1. However, a trained model performs so badly that it never predicts node with label 1 correctly.

I’m still working to include edge feature in the model, but I doubt it will improve the model’s performance to predict node label 1. So, I’m asking if anything I can do more in order to make the prediction better?

Thanks a lot for any suggestion

I feel your label distribution is not that skew (around 1:3). So I wonder is there other problems. Could you plot the training curve here?

In terms of label skewness. I can think of two solutions:

Thanks for the reply @minjie,

How do I generate a training curve with the DGL?
Here is the last 20 epoch from my training

Epoch 00980 | Time(s) 0.0153 | Loss 0.4377 | Accuracy 0.8213 | ETputs(KTEPS) 23904.23
Epoch 00981 | Time(s) 0.0153 | Loss 0.4385 | Accuracy 0.8213 | ETputs(KTEPS) 23904.56
Epoch 00982 | Time(s) 0.0153 | Loss 0.4365 | Accuracy 0.8212 | ETputs(KTEPS) 23904.71
Epoch 00983 | Time(s) 0.0153 | Loss 0.4381 | Accuracy 0.8211 | ETputs(KTEPS) 23904.45
Epoch 00984 | Time(s) 0.0153 | Loss 0.4386 | Accuracy 0.8211 | ETputs(KTEPS) 23902.14
Epoch 00985 | Time(s) 0.0153 | Loss 0.4396 | Accuracy 0.8210 | ETputs(KTEPS) 23903.45
Epoch 00986 | Time(s) 0.0153 | Loss 0.4385 | Accuracy 0.8208 | ETputs(KTEPS) 23902.70
Epoch 00987 | Time(s) 0.0153 | Loss 0.4389 | Accuracy 0.8208 | ETputs(KTEPS) 23903.11
Epoch 00988 | Time(s) 0.0153 | Loss 0.4391 | Accuracy 0.8207 | ETputs(KTEPS) 23903.38
Epoch 00989 | Time(s) 0.0153 | Loss 0.4374 | Accuracy 0.8205 | ETputs(KTEPS) 23903.97
Epoch 00990 | Time(s) 0.0153 | Loss 0.4401 | Accuracy 0.8204 | ETputs(KTEPS) 23904.03
Epoch 00991 | Time(s) 0.0153 | Loss 0.4379 | Accuracy 0.8204 | ETputs(KTEPS) 23904.39
Epoch 00992 | Time(s) 0.0153 | Loss 0.4387 | Accuracy 0.8204 | ETputs(KTEPS) 23902.57
Epoch 00993 | Time(s) 0.0153 | Loss 0.4395 | Accuracy 0.8204 | ETputs(KTEPS) 23903.27
Epoch 00994 | Time(s) 0.0153 | Loss 0.4408 | Accuracy 0.8208 | ETputs(KTEPS) 23902.76
Epoch 00995 | Time(s) 0.0153 | Loss 0.4367 | Accuracy 0.8208 | ETputs(KTEPS) 23903.29
Epoch 00996 | Time(s) 0.0153 | Loss 0.4433 | Accuracy 0.8210 | ETputs(KTEPS) 23904.11
Epoch 00997 | Time(s) 0.0153 | Loss 0.4376 | Accuracy 0.8208 | ETputs(KTEPS) 23904.79
Epoch 00998 | Time(s) 0.0153 | Loss 0.4395 | Accuracy 0.8204 | ETputs(KTEPS) 23905.55
Epoch 00999 | Time(s) 0.0153 | Loss 0.4333 | Accuracy 0.8204 | ETputs(KTEPS) 23902.47

I did try with nn.BCEWithLogitsLoss() or nn.BCELoss () as the alternative function loss than CrossEntropy. But it produces an error, saying
ValueError: Target size (torch.Size([55827])) must be the same as input size (torch.Size([55827, 2])).

And when I use FocalLoss as you suggested, it raises an error, saying
ValueError: Target and input must have the same number of elements. target nelement (55827) != input nelement (111654). I can understand 55827 is the number of nodes I use for training, but I couldn’t figure out what 111654 is?

Thanks @minjie