CUDA kernel launch error: no kernel image is available for execution on the device

Environment:
DGL: 0.9 (built from source)
Pytorch: 1.11.0
OS: Linux
Python: 3.9.12
CUDA: 11.3
Driver: 510.47.03
GPU: Tried both V100 and A100

[21:43:07] /path/dgl/src/array/cuda/array_op_impl.cu:254: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA kernel launch error: no kernel image is available for execution on the device
Stack trace:
  [bt] (0) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(+0x8a4478) [0x7f6354a37478]
  [bt] (1) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::runtime::NDArray dgl::aten::impl::Range<(DLDeviceType)2, long>(long, long, DLContext)+0x1e0) [0x7f6354a39560]
  [bt] (2) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::aten::Range(long, long, unsigned char, DLContext)+0x1fd) [0x7f635450314d]
  [bt] (3) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::UnitGraph::COO::Edges(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x9f) [0x7f63549fbc2f]
  [bt] (4) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::UnitGraph::Edges(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0xb5) [0x7f63549e87c5]
  [bt] (5) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::HeteroGraph::Edges(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x40) [0x7f63548e3670]
  [bt] (6) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(+0x764f82) [0x7f63548f7f82]
  [bt] (7) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(DGLFuncCall+0x60) [0x7f6354878350]
  [bt] (8) /path/miniconda3/envs/gnn-p39/lib/python3.9/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7f6458dbea4a]

In some cases, it works. In some cases I get this error. What’s the source of this error?

This error’s pretty interesting since I am getting the same error even on the scripts I used to train models before.

The issue disappeared after updating the library.

Cool! So everything goes well now, right?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.