CUDA kernel launch error: no kernel image is available for execution on the device

Environment:
DGL: 0.9 (built from source)
Pytorch: 1.11.0
OS: Linux
Python: 3.9.12
CUDA: 11.3
Driver: 510.47.03
GPU: Tried both V100 and A100

[21:43:07] /path/dgl/src/array/cuda/array_op_impl.cu:254: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA kernel launch error: no kernel image is available for execution on the device
Stack trace:
  [bt] (0) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(+0x8a4478) [0x7f6354a37478]
  [bt] (1) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::runtime::NDArray dgl::aten::impl::Range<(DLDeviceType)2, long>(long, long, DLContext)+0x1e0) [0x7f6354a39560]
  [bt] (2) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::aten::Range(long, long, unsigned char, DLContext)+0x1fd) [0x7f635450314d]
  [bt] (3) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::UnitGraph::COO::Edges(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x9f) [0x7f63549fbc2f]
  [bt] (4) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::UnitGraph::Edges(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0xb5) [0x7f63549e87c5]
  [bt] (5) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(dgl::HeteroGraph::Edges(unsigned long, std::__cxx11::basic_string<char, std::char_traits<char>, std::allocator<char> > const&) const+0x40) [0x7f63548e3670]
  [bt] (6) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(+0x764f82) [0x7f63548f7f82]
  [bt] (7) /path/miniconda3/envs/gnn-p39/lib/python3.9/site-packages/dgl-0.9-py3.9-linux-x86_64.egg/dgl/libdgl.so(DGLFuncCall+0x60) [0x7f6354878350]
  [bt] (8) /path/miniconda3/envs/gnn-p39/lib/python3.9/lib-dynload/../../libffi.so.8(+0x6a4a) [0x7f6458dbea4a]

In some cases, it works. In some cases I get this error. What’s the source of this error?

This error’s pretty interesting since I am getting the same error even on the scripts I used to train models before.

The issue disappeared after updating the library.

Cool! So everything goes well now, right?