Training with the random generated synthetic features

Dear all,

I’ve been working on the implementation of node and edge features in GCN for the binary node classification with some random synthetic generated features (node & edge). The training of my result always shows the accuracy > 1.00 which is not true. Loss also looks wrong…

Epoch 00000 | Loss 0.6931 | Train Acc 3393.0000 | Test Acc 890.0000 | Time(s) 0.0772
Epoch 00001 | Loss 0.6931 | Train Acc 3393.0000 | Test Acc 890.0000 | Time(s) 0.1543
Epoch 00002 | Loss 0.6931 | Train Acc 3393.0000 | Test Acc 890.0000 | Time(s) 0.2339

I’m not sure if the random synthetic data I generated caused it. I usually randomize node features to see if my architecture runs without error or not, and usually, it still gives me accuracy < 1.0.

So, does anybody also had the same problem? or can anybody show me a simple implementation of using node and edge feature for the binary node classification?

Thanks all

For accuracy numbers greater than 1, most likely you have not divided the number of correct predictions by the number of samples. For unchanged loss, you can print loss for each iteration to check whether it really decreases for the first few iterations. If not, most likely your forgot to call optimizer.step() or the computation graph is broken somewhere for backward propagation.

Thanks @mufeili,

I think I have divided the number of correct predictions by the number of samples with return correct.item() * 1.0 / len(labels).

And do call optimizer.step() after I call the backward function.

So I’m thinking now that the computation graph is broken. If this is the case, how do I check if the computation graph is broken?

Thanks so much

I

  1. Can you check the shape of labels? Also how is correct computed?
  2. After you called loss.backward(), check if model parameters have gradients generated. If not, then gradient is broken there.

Thanks @mufeili,

  1. The size of my labels are torch.Size([5846, 1]) and I calculate the correct as follow:
 def evaluate(model, g, nfeats, efeats, labels, mask):
    model.eval()
    with torch.no_grad():
        logits = model(g, nfeats, efeats)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = torch.max(logits, dim=1)
        correct = torch.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)
  1. I print out the g.ndata['h'] and g.ndata['h_neigh'] in each layer with attribute grad_fn=<ReluBackward0> and grad_fn=<CopyReduceBackward> respectively. My implementation is actually still in Accuracy of training beyond 1.0

Thanks so much

  1. Your code block looks fine. But since accuracy is wrong, something must be wrong. I’d recommend do more printing about intermediary results to figure out what’s going on.
  2. Can you print the loss to see if loss changes over epochs?

Thanks @mufeili,

I tried to print out the intermediary results which is shown below:

logits: tensor([[0.0356],
        [0.4634],
        [0.0000],
        ...,
        [0.0084],
        [0.0000],
        [0.3191]], grad_fn=<ReluBackward0>)
logp: tensor([0., 0., 0.,  ..., 0., 0., 0.], grad_fn=<ViewBackward>) torch.Size([5707])
loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Correct: tensor(947)  | Labels: tensor([1., 0., 1.,  ..., 1., 1., 0.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
test_acc: 0.4937434827945777
Correct: tensor(993)  | Labels: tensor([0., 1., 1.,  ..., 1., 1., 1.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
train_acc: 0.5105398457583548
tst_loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Epoch 00000 | Loss 0.6931 | Train Acc 0.5105 | Test Acc 0.4937 | Time(s) 0.0491
***************************************************************************
logits: tensor([[0.1085],
        [0.2853],
        [0.0793],
        ...,
        [0.5197],
        [0.1882],
        [0.2970]], grad_fn=<ReluBackward0>)
logp: tensor([0., 0., 0.,  ..., 0., 0., 0.], grad_fn=<ViewBackward>) torch.Size([5707])
loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Correct: tensor(947)  | Labels: tensor([1., 0., 1.,  ..., 1., 1., 0.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
test_acc: 0.4937434827945777
Correct: tensor(993)  | Labels: tensor([0., 1., 1.,  ..., 1., 1., 1.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
train_acc: 0.5105398457583548
tst_loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Epoch 00001 | Loss 0.6931 | Train Acc 0.5105 | Test Acc 0.4937 | Time(s) 0.0918
***************************************************************************
logits: tensor([[0.0543],
        [0.4011],
        [0.0465],
        ...,
        [0.2958],
        [0.0811],
        [0.2602]], grad_fn=<ReluBackward0>)
logp: tensor([0., 0., 0.,  ..., 0., 0., 0.], grad_fn=<ViewBackward>) torch.Size([5707])
loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Correct: tensor(947)  | Labels: tensor([1., 0., 1.,  ..., 1., 1., 0.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
test_acc: 0.4937434827945777
Correct: tensor(993)  | Labels: tensor([0., 1., 1.,  ..., 1., 1., 1.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
train_acc: 0.5105398457583548
tst_loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Epoch 00002 | Loss 0.6931 | Train Acc 0.5105 | Test Acc 0.4937 | Time(s) 0.1351
***************************************************************************
logits: tensor([[0.0834],
        [0.3571],
        [0.0044],
        ...,
        [0.0896],
        [0.3438],
        [0.2082]], grad_fn=<ReluBackward0>)
logp: tensor([0., 0., 0.,  ..., 0., 0., 0.], grad_fn=<ViewBackward>) torch.Size([5707])
loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Correct: tensor(947)  | Labels: tensor([1., 0., 1.,  ..., 1., 1., 0.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
test_acc: 0.4937434827945777
Correct: tensor(993)  | Labels: tensor([0., 1., 1.,  ..., 1., 1., 1.])  | Indices: tensor([0, 0, 0,  ..., 0, 0, 0])
train_acc: 0.5105398457583548
tst_loss tensor(0.6931, grad_fn=<BinaryCrossEntropyWithLogitsBackward>)
Epoch 00003 | Loss 0.6931 | Train Acc 0.5105 | Test Acc 0.4937 | Time(s) 0.1768

Although the accuracy now is between 0 and 1, but the loss and accuracy (both train and test) do not changed at all even when I set the epoch to be 100. So, I still think that something is going wrong there. But I could not figure out what causes it.

Thanks all for helping me out.

  1. I assume the input node features are treated as given and they are not learned from scratch?
  2. You can check the gradient of parameters and see if they are nonzero. An example is given below:
import torch
feats = torch.randn(2, 2)
scores = torch.randn(2, 1)
model = torch.nn.Linear(2, 1)
pred = model(feats)
loss = ((pred - scores) ** 2).sum()
loss.backward()
for param in list(model.parameters()):
     print(param.grad)

Thanks @mufeili,

  1. Yes, they (node features) are treated as given. It has a size of torch.Size([5925, 9]). But Im not sure what do you mean by “they are not learned from scratch?”.
  2. I discovered that the gradient of parameters is all zeros. How should I fix it? and what causes it?

Thanks so much

  1. There can be cases where node features are embeddings learned from scratch, in which case additional issues might rise.
  2. That makes sense since your loss does not change at all. Can you disable dropout and see if the issue still exists? Also, did you do anything special for weight initialization?

Thanks @mufeili,

  1. I think my node features are not embedding learned from scratch
  2. I tried to disable the dropout, but the problem still persists.

I tried to simplify the node as simple as possible, but I cannot find something wrong with it.

class GNNLayer3(nn.Module):
  def __init__(self, ndim_in, edims, ndim_out):
    super(GNNLayer3, self).__init__()
    print("ndim_in:",ndim_in, "ndim_out:",ndim_out, "edims:",edims)
    self.W_msg = nn.Linear(ndim_in + edims, ndim_out) 
    self.W_apply = nn.Linear(ndim_in + ndim_out, ndim_out) 
    
  def message_func(self, edges):
    msg = torch.cat([edges.src['h'],edges.data['h']], 1)
    return {'m': F.relu(self.W_msg(msg))}  
    
  def forward(self, gc, nfeats, efeats):
    with gc.local_scope():
      gc.ndata['h'] = nfeats; gc.edata['h'] = efeats          
      gc.update_all(self.message_func, 
                   fn.sum('m', 'h_neigh'))      
      app = torch.cat([gc.ndata['h'], gc.ndata['h_neigh']], 1)
      gc.ndata['h'] = F.relu(self.W_apply(app))
      return gc.ndata['h']
# ===================================================================

class NetArchie3(nn.Module):
  def __init__(self, ndim_in, ndim_out, edim):
    super(NetArchie3, self).__init__()
    self.layers = nn.ModuleList()
    self.layers.append(GNNLayer3(ndim_in, edim, 50))
    self.layers.append(GNNLayer3(50, edim, ndim_out))

  def forward(self, g_dgl, nfeats, efeats):
    for i, layer in enumerate(self.layers):
      nfeats = layer(g_dgl, nfeats, efeats)
    return nfeats
# ===================================================================

model = NetArchie3(9, 1, 2)

def evaluate(model, g, nfeats, efeats, labels, mask):
    model.eval()
    with torch.no_grad():
        logits = model(g, nfeats, efeats)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = torch.max(logits, dim=1)
        correct = torch.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)

for epoch in range(10):
  model.train()
  logits = model(gc, gc.ndata['hn'], gc.edata['he'])
  logp = F.log_softmax(logits, 1)
  logpr = logp.view(gc.number_of_nodes(),)
  loss = loss_fcn(logpr[train_mask], labels[train_mask])
  optimizer.zero_grad(); loss.backward(); optimizer.step()
  test_acc = evaluate(model, gc, gc.ndata['hn'], gc.edata['he'], labels, test_mask)
  print("test_acc:",test_acc, "loss", loss.items())

and the evaPerhaps you can spot something peculiar in the design. Thanks mufeili.

Can you give a code snippet that I can run directly to reproduce the issue? My guess is that something is wrong in these lines:

logp = F.log_softmax(logits, 1)
logpr = logp.view(gc.number_of_nodes(),)
loss = loss_fcn(logpr[train_mask], labels[train_mask])

Thanks for your time mufeili,

Although I modify the suspicious lines you mentioned above, the result still the same. Here is the full version of the code

import torch.nn as nn
import torch.nn.functional as F
import random

g_nx = nx.random_regular_graph(random.randint(1,3),100)
gc = dgl.DGLGraph()
gc.from_networkx(g_nx) 

gc.ndata['hn'] = torch.randint(0, 2, (gc.number_of_nodes(),9), dtype=torch.float32)
gc.edata['he'] = torch.randint(0, 2, (gc.number_of_edges(),2), dtype=torch.float32)
labels = torch.randint(0, 2, (gc.number_of_nodes(),1), dtype=torch.float32)

TTE_possible = [[True,False,False],[False,True,False],[False,False,True]]
TTE_indices = np.random.choice(len(TTE_possible), gc.number_of_nodes(), replace=True)
TTE = [TTE_possible[i] for i in TTE_indices]
train_mask = torch.tensor([TTE[i][0] for i in range(len(TTE))])
test_mask = torch.tensor([TTE[i][1] for i in range(len(TTE))])
eval_mask = torch.tensor([TTE[i][2] for i in range(len(TTE))])

class GNNLayer3(nn.Module):
  def __init__(self, ndim_in, edims, ndim_out):
    super(GNNLayer3, self).__init__()
    #print("ndim_in:",ndim_in, "ndim_out:",ndim_out, "edims:",edims)
    self.W_msg = nn.Linear(ndim_in + edims, ndim_out) 
    self.W_apply = nn.Linear(ndim_in + ndim_out, ndim_out) 
    
  def message_func(self, edges):
    msg = torch.cat([edges.src['h'],edges.data['h']], 1)
    return {'m': F.relu(self.W_msg(msg))}  
    
  def forward(self, gc, nfeats, efeats):
    with gc.local_scope():
      gc.ndata['h'] = nfeats; gc.edata['h'] = efeats          
      gc.update_all(self.message_func, 
                   fn.sum('m', 'h_neigh'))      
      app = torch.cat([gc.ndata['h'], gc.ndata['h_neigh']], 1)
      gc.ndata['h'] = F.relu(self.W_apply(app))
      return gc.ndata['h']
# ===================================================================

class NetArchie3(nn.Module):
  def __init__(self, ndim_in, ndim_out, edim):
    super(NetArchie3, self).__init__()
    self.layers = nn.ModuleList()
    self.layers.append(GNNLayer3(ndim_in, edim, 50))
    self.layers.append(GNNLayer3(50, edim, ndim_out))

  def forward(self, g_dgl, nfeats, efeats):
    for i, layer in enumerate(self.layers):
      nfeats = layer(g_dgl, nfeats, efeats)
    return nfeats
# ===================================================================

model = NetArchie3(9, 1, 2)

def evaluate(model, g, nfeats, efeats, labels, mask):
    model.eval()
    with torch.no_grad():
        logits = model(g, nfeats, efeats)
        logits = logits[mask]
        labels = labels[mask]
        _, indices = torch.max(logits, dim=1)
        correct = torch.sum(indices == labels)
        return correct.item() * 1.0 / len(labels)

for epoch in range(10):
  model.train()
  logits = model(gc, gc.ndata['hn'], gc.edata['he'])
  logp = F.log_softmax(logits, 1)
  loss = loss_fcn(logp[train_mask], labels[train_mask])
  optimizer.zero_grad(); loss.backward(); optimizer.step()
  test_acc = evaluate(model, gc, gc.ndata['hn'], gc.edata['he'], labels, test_mask)
  print("test_acc:",test_acc, "loss", loss.item())

How did you define loss_fcn and optimizer?

I’m sorry, I forgot to mention them. They are defined as

optimizer = torch.optim.Adam(model.parameters(), lr=1e-2)
loss_fcn = torch.nn.BCEWithLogitsLoss()

Then you simply need to replace

logp = F.log_softmax(logits, 1)
loss = loss_fcn(logp[train_mask], labels[train_mask])

with

loss = loss_fcn(logits[train_mask], labels[train_mask])

The reason is that BCEWithLogitsLoss will perform logsoftmax in loss computation and you do not need to perform it yourself.

Thanks Mufeili,

I should have realized it. But although the loss now decreases my accuracy doesn’t change even when I now apply dropout. Does it affect the evaluate function, because I just use that function from https://docs.dgl.ai/en/0.4.x/tutorials/models/1_gnn/1_gcn.html?

Thanks mufeili

Did you check training accuracy?

Unfortunately, the training accuracy behaves similar with the test accuracy. They do not change.

However, I suspect because the shape of my target label is generated
labels = torch.randint(0, 2, (gc.number_of_nodes(),1), dtype=torch.float32) which has different shape with the shape of labels in Cora dataset. But the architecture I made already accept the shape I defined and reshaping it will cause an error in a different dimension in the loss function as it says “ValueError: Target size (torch.Size([33])) must be the same as input size (torch.Size([33, 1]))”.

Thank you very much mufeili

The thing is that for this synthetic dataset, all labels, node features and edge features are random and do not have any semantic meaning. In such case, it should be expected that the model won’t work at all.