KeyError of 'h' when using average pooling in evaluation mode

Aceticia · October 24, 2020, 11:31pm

This is the error message:

Traceback (most recent call last):
  File "debug.py", line 280, in <module>
    print([i.item() for i in model.test_step(data, 0)])
  File "debug.py", line 232, in test_step
    L, L_est, L_init, metric = self.forward_mult(batch, opts, True)
  File "debug.py", line 172, in forward_mult
    g, h, L_hat_1, L_hat_2, L_est_A_1, L_est_A_2 = self.forward(g, x, opts)
  File "debug.py", line 146, in forward
    h = self.readout(g, h)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/torch/nn/modules/module.py", line 550, in __call__
    result = self.forward(*input, **kwargs)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/dgl/nn/pytorch/glob.py", line 168, in forward
    readout = mean_nodes(graph, 'h')
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/dgl/readout.py", line 198, in mean_nodes
    return readout_nodes(graph, feat, weight, ntype=ntype, op='mean')
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/dgl/readout.py", line 88, in readout_nodes
    return segment.segment_reduce(graph.batch_num_nodes(ntype), x, reducer=op)
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/dgl/ops/segment.py", line 56, in segment_reduce
    return g.dstdata['h']
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/dgl/view.py", line 66, in __getitem__
    return self._graph._get_n_repr(self._ntid, self._nodes)[key]
  File "/afs/ece.cmu.edu/usr/xujinl/anaconda3/envs/CSD/lib/python3.7/site-packages/dgl/frame.py", line 386, in __getitem__
    return self._columns[name].data
KeyError: 'h'

It’s weird that this bug happens at test but not at training. Any hints or pointers to why? Thanks. This is how I set up the layer:

self.readout = AvgPooling()
h = self.readout(g, h)

mufeili · October 25, 2020, 6:12am

Can you provide a minimal example for reproducing the error?

Aceticia · October 25, 2020, 7:42pm

Sorry, when trying to reproduce the bug I found that this is an error on my part. When removing nodes I my neural network removed too many such that the graph became empty. Perhaps add in a warning somewhere about the graph is empty?

mufeili · October 26, 2020, 6:31am

I see two ways to fix this issue:

Raise an error when trying to retrieve the node/edge features from a node of 0 nodes/edges.
Allow setting empty tensors for the features of a graph with no nodes/edges.

Which seems more reasonable to you?

Aceticia · October 26, 2020, 6:43am

The first solution sounds more fitting to my application.