DGLError: Check failed: allow_missing: Device API cuda is not enabled. Please install the cuda version of dgl

AndreaBasile97 · May 5, 2023, 10:54am

Hello everyone! I’m facing with this problem. I installed the GPU version using these commands:
! pip install dgl -f https://data.dgl.ai/wheels/cu118/repo.html
! pip install dglgo -f https://data.dgl.ai/wheels-test/repo.html
And I double checked that my CPU version of DGL was correctly UNINSTALLED.

The problem appears only when I’m using V100 GPU on colab while not with T4.

Since colab decides what GPU I have to use, there is a way to able DGL to work with my actual GPU?

---------------------------------------------------------------------------
DGLError                                  Traceback (most recent call last)
<ipython-input-23-f497be3f5b48> in <cell line: 9>()
      7 weight_decay = 0.01  # weight decay
      8 lr_decay_rate = 0.0001  # learning rate decay rate
----> 9 trained_model = train(batched_graph_list, batched_ids, validation_graph_list, val_batched_ids, in_features, hidden_features, num_heads, num_classes, epochs, learning_rate, weight_decay, lr_decay_rate)

2 frames
/usr/local/lib/python3.10/dist-packages/dgl/heterograph_index.py in copy_to(self, ctx)
    253             The graph index on the given device context.
    254         """
--> 255         return _CAPI_DGLHeteroCopyTo(self, ctx.device_type, ctx.device_id)
    256 
    257     def pin_memory(self):

dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FunctionBase.__call__()

dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FuncCall()

dgl/_ffi/_cython/./function.pxi in dgl._ffi._cy3.core.FuncCall3()

DGLError: [10:47:07] /opt/dgl/src/runtime/c_runtime_api.cc:82: Check failed: allow_missing: Device API cuda is not enabled. Please install the cuda version of dgl.
Stack trace:
  [bt] (0) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dmlc::LogMessageFatal::~LogMessageFatal()+0x75) [0x7fdcde348e55]
  [bt] (1) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::DeviceAPIManager::GetAPI(std::string, bool)+0x1f2) [0x7fdcde6c85f2]
  [bt] (2) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::DeviceAPI::Get(DGLContext, bool)+0x1e1) [0x7fdcde6c2ba1]
  [bt] (3) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::NDArray::Empty(std::vector<long, std::allocator<long> >, DGLDataType, DGLContext)+0x13b) [0x7fdcde6e5acb]
  [bt] (4) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::runtime::NDArray::CopyTo(DGLContext const&) const+0xc3) [0x7fdcde71fe23]
  [bt] (5) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::UnitGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&)+0x3ef) [0x7fdcde82d79f]
  [bt] (6) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(dgl::HeteroGraph::CopyTo(std::shared_ptr<dgl::BaseHeteroGraph>, DGLContext const&)+0xf6) [0x7fdcde731286]
  [bt] (7) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(+0x52cbb6) [0x7fdcde740bb6]
  [bt] (8) /usr/local/lib/python3.10/dist-packages/dgl/libdgl.so(DGLFuncCall+0x48) [0x7fdcde6c7bb8]

Rhett-Ying · May 5, 2023, 12:08pm

which version are you using? please check with python3 -c 'import dgl;print(dgl.__version__)

and could you check whether GPU is available via python3 -c 'import torch;print(torch.cuda.device_count())'

AndreaBasile97 · May 5, 2023, 1:01pm

1.1.0+cu118
1 device found

I highlight that the problem is with V100 GPU. This problem not appear when using A100 or T4.

Rhett-Ying · May 6, 2023, 2:29am

so torch.rand(3,2).to('cuda') works but dgl.rand_graph(10,20).to('cuda') crashed when running on colab with V100?

what is the torch version you’re using?

after changing runtime configure to V100, have you ever tried to re-install DGL(uninstall then install)?

AndreaBasile97 · May 6, 2023, 10:41am

Yes
torch 11.8
Yes, i tried to reinstall DGL every time. I repeat myself, this error happens only with V100 (Probably this GPU doesn’t support the cuda version of DGL).

Rhett-Ying · May 7, 2023, 1:35am

torch 11.8 ? or cuda 11.8 ? DGL requires PyTorch 1.12.0+

system · June 6, 2023, 1:35am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.