Dgl/src/runtime/cuda/cuda_device_api.cc:103: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading

While training my GNN model I keep getting errors like this:

I am using cuda 11.1, pytorch 1.9.0.

Anyone has the similar issue? How to solve it?

Hi, is the cuda version when install dgl is 11.1 too? match pytorch you installed?

Yeah. Cuda version 11.1 when installing dgl. they match

are you training in distributed mode? could you provide a demo which could reproduce this issue? or more details such as callstacks. I cannot find more clues according to the screenshot.