Question of HGT models of new version(0.8)?

xixi-baba · March 16, 2022, 4:16pm

I didn’t found an example of HGT models using HGTConv https://docs.dgl.ai/_modules/dgl/nn/pytorch/conv/hgtconv.html#HGTConv

I followed the example of OpenHGNN here(OpenHGNN/HGT.py at main · BUPT-GAMMA/OpenHGNN · GitHub) to implement my own model but get the following error messages

  File "xxx.py", line 218, in forward
    h = self.gcs[i](g, h, g.ndata['_TYPE'], g.edata['_TYPE'], presorted = True)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/dgl/nn/pytorch/conv/hgtconv.py", line 135, in forward
    g.apply_edges(self.message)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/dgl/heterograph.py", line 4441, in apply_edges
    edata = core.invoke_edge_udf(g, eid, etype, func)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/dgl/core.py", line 85, in invoke_edge_udf
    return func(ebatch)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/dgl/nn/pytorch/conv/hgtconv.py", line 160, in message
    m.append(self.relation_msg[i](v[i], etype, self.presorted))  # (E, O)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
    result = self.forward(*input, **kwargs)
  File "/usr/local/anaconda3/envs/d80/lib/python3.7/site-packages/dgl/nn/pytorch/linear.py", line 170, in forward
    pos_r = torch.cat([pos_l[1:], torch.tensor([len(x_type)], device=x.device)])
RuntimeError: CUDA error: device-side assert triggered

I check the value of parameters of HGTConv initialization, it’s compatible with g.ndata[’_TYPE’], g.edata[’_TYPE’].
I also print the value of pos_l, it seems the error occurs when pos_l has only one element and call pos_l[1:]
But I’m not clear with the further reason and will you provide and offical example of HGTConv?

BarclayII · March 19, 2022, 4:33am

A CUDA device-side assert usually means that the indices are out-of-bound or there is a NaN. Did you try running on CPU? That could help us locate the problem.

xixi-baba · March 20, 2022, 3:58pm

The message seems not changed, do you have a plan for an official example implementation of the model with HGTConv ?

BarclayII · March 21, 2022, 3:33am

It still says “CUDA device-side assert triggered” even if you run on CPU? You could make sure the code runs on CPU by setting the environment variable CUDA_VISIBLE_DEVICES= (i.e. empty string).

While we do have an example for HGTConv in examples/pytorch/hgt, we would still recommend OpenHGNN’s implementation since ours only handles a small graph.

system · April 20, 2022, 3:33am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.