Error: Check failed: e == CUSPARSE_STATUS_SUCCESS: CUSPARSE ERROR: 10

impchai · October 20, 2023, 3:14am

Hi, recently i have met a problem. When i try to use dgl-cu110 and pytorch version 1.10.2 to calculate graph learning, there raise a issue like:
y_dis = self.grapg_compute(sub,y_res.permute(0,2,1))
File “/data2/chaihaoye/test/test/fsdd.py”, line 339, in grapg_compute
g.update_all(message_func, reduce_func)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/heterograph.py”, line 4686, in update_all
ndata = core.message_passing(g, message_func, reduce_func, apply_node_func)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/core.py”, line 283, in message_passing
ndata = invoke_gspmm(g, mfunc, rfunc)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/core.py”, line 258, in invoke_gspmm
z = op(graph, x)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/ops/spmm.py”, line 170, in func
return gspmm(g, ‘copy_lhs’, reduce_op, x, None)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/ops/spmm.py”, line 62, in gspmm
ret = gspmm_internal(g._graph, op,
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/backend/pytorch/sparse.py”, line 307, in gspmm
return GSpMM.apply(gidx, op, reduce_op, lhs_data, rhs_data)
File “/usr/local/anaconda3/envs/torch-1.13.1-py39/lib/python3.9/site-packages/torch/cuda/amp/autocast_mode.py”, line 105, in decorate_fwd
return fwd(*args, **kwargs)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/backend/pytorch/sparse.py”, line 87, in forward
out, (argX, argY) = _gspmm(gidx, op, reduce_op, X, Y)
File “/data2/chaihaoye/.local/lib/python3.9/site-packages/dgl/sparse.py”, line 157, in _gspmm
_CAPI_DGLKernelSpMM(gidx, op, reduce_op,
File “dgl/_ffi/_cython/./function.pxi”, line 287, in dgl._ffi._cy3.core.FunctionBase.call
File “dgl/_ffi/_cython/./function.pxi”, line 232, in dgl._ffi._cy3.core.FuncCall
File “dgl/_ffi/_cython/./base.pxi”, line 155, in dgl._ffi._cy3.core.CALL
dgl._ffi.base.DGLError: [11:09:19] /opt/dgl/src/array/cuda/spmm.cu:213: Check failed: e == CUSPARSE_STATUS_SUCCESS: CUSPARSE ERROR: 10

my code is defined as:

def grapg_compute(self, g, feature):
    g.ndata['h'] = feature        
    message_func = fn.copy_u(u='h', out='m')          
    reduce_func = fn.sum(msg='m', out='h_neigh')  #        
    g.update_all(message_func, reduce_func)
    
    h = g.ndata['h_neigh']
    return h

I can run the program some time age, but now i cant run the program with the above issue.

dyru · October 25, 2023, 6:42pm

Hi @impchai Could you provide complete environment information including:

DGL Version (e.g., 1.0):
Backend Library & Version (e.g., PyTorch 0.4.1, MXNet/Gluon 1.3):
OS (e.g., Linux):
How you installed DGL (conda, pip, source):
Build command you used (if compiling from source):
Python version:
CUDA/cuDNN version (if applicable):
GPU models and configuration (e.g. V100)

and a snippet that can reproduce your error?

The error message seems to indicate a mis-installation or environmental error. You may also try a re-installation following the guide here

system · November 24, 2023, 6:43pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.