Cuda error in executing pdbbind example

Hi DGL community,

I’m very new to this community and DGL. I’ve been working on DGL lifesci examples to learn about the library. I managed to run tox21 example but when I was running pdbbind example I came across this error.

File “/home/ok/anaconda3/envs/dgl/lib/python3.7/site-packages/torch/functional.py”, line 682, in _unique_impl
return_counts=return_counts,
RuntimeError: radix_sort: failed on 2nd step: cudaErrorInvalidValue: invalid argument

since tox21 ran without this problem, I’m suspecting my cuda setting is ok (CUDA 11.0)
and also I’m using pdbbind2015 dataset so I’m believing dataset is also ok
and I can not figure out where the error is coming from…

Is there anyone with same problem?

Did you directly run an example? How can I reproduce the error?

Thanks for your reply,

I followed the instruction on Github to install dgl, dgl lifesci, then hit
./main.py -m ACNN -d PDBBind_refined_pocket_temporal
(main.py from the dgl-lifesci/examples/binding-affinity-prediction)

Also, I couldn’t download pdbbind2015.tar.gz from my proxy environment so I downloaded it manually and changed pdbbind.py to point out to downloaded .tar.gz.

I’m running the code on VERY old GPU (Quadro P4000) so that, along with the nvidia driver, cuda version could be the problem. I can run perfectly using CPU.
Googling the error message tells me that it could actually be the CUDA setup related conflict so I will try using different machine (GPU) to see if this it.

Thanks again!

Sounds good. Good luck!