Running DGL with ROCm (AMD GPUS)

Hi! I am trying to run DGL in a HPC cluster with AMD GPU nodes with ROCm instead of cuda.

It is triggering the error:

dgl._ffi.base.DGLError: [20:17:01] /opt/dgl/src/runtime/c_runtime_api.cc:88: Check failed: allow_missing: Device API gpu is not enabled. Please install the cuda version of dgl.

I am using a singularity container where I have installed pytorch using the ROCm installation: pip3 install torch torchvision==0.11.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html and dgl using conda install -c dglteam dgl.

I was wondering if you have encountered this issue before and if you know how to solve it. Thanks!

Hi,

Sorry we don’t support AMD GPUs now, because all our gpu kernels are programmed with CUDA and its toolchain. For now you can only try cpu

Oh I see, that makes sense! Thanks for your reply @VoVAllen !

Do you plan to support AMD GPUs in the near future? I have access to an AMD HPC and I am happy to help debugging if yes. Apparently there’s this HIPIFY toolbox that helps translating CUDA into HIIP/ROCm in a semi-automatic manner: GitHub - ROCm-Developer-Tools/HIPIFY: HIPIFY: Convert CUDA to Portable C++ Code

https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.