Debugging an error with RGCN

Hey!

I’m currently working on a node classifier using RGCN.

While searching for best hyperparameters for the model using Ray/Tune, I get the following error for some models, but not all of them.

  File "tpe_search_nc.py", line 195, in build_model
    train(model, optimizer, train_loader)
  File "tpe_search_nc.py", line 45, in train
    bg.edata['kind'].squeeze().long()), 1)
  File "/home/ubuntu/ai2d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/media/volume/work/C20/networks.py", line 104, in forward
    x = rel_graph_conv(g, x, etypes)
  File "/home/ubuntu/ai2d/lib/python3.6/site-packages/torch/nn/modules/module.py", line 541, in __call__
    result = self.forward(*input, **kwargs)
  File "/home/ubuntu/ai2d/lib/python3.6/site-packages/dgl/nn/pytorch/conv/relgraphconv.py", line 174, in forward
    g.edata['type'] = etypes
  File "/home/ubuntu/ai2d/lib/python3.6/site-packages/dgl/view.py", line 133, in __setitem__
    self._graph.set_e_repr({key : val}, self._edges)
  File "/home/ubuntu/ai2d/lib/python3.6/site-packages/dgl/graph.py", line 1879, in set_e_repr
    nfeats = F.shape(val)[0]
IndexError: tuple index out of range

Any idea what causes the error? It seems to be related to node features, but I’m confused because the error appears only every now and then and only using RGCN. GCN and GraphSAGE work just fine.

Thanks for help in advance and let me know if you need more information / context, in case this might be a bug within DGL.

That’s weird, could you please provide the shape of etypes you feed into rel_graph_conv? thanks.

I’ve been trying to replicate the error today, but can only manage doing so using Ray / Tune.

Here’s an example of a tensor passed to etypes:

tensor([4, 2, 2, 2, 2, 2, 2, 2, 2, 6, 1, 1, 1, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0,
        2, 2, 1, 1, 1, 2, 1, 1, 1, 2, 1, 2, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3, 3,
        3, 3, 3, 6])

Sorry for the late reply, but I cannot reproduce the bug, would etypes.unsqueeze(-1) help?