Running DGL with ROCm (AMD GPUS)

paulamartingonzalez · October 30, 2021, 7:23pm

Hi! I am trying to run DGL in a HPC cluster with AMD GPU nodes with ROCm instead of cuda.

It is triggering the error:

dgl._ffi.base.DGLError: [20:17:01] /opt/dgl/src/runtime/c_runtime_api.cc:88: Check failed: allow_missing: Device API gpu is not enabled. Please install the cuda version of dgl.

I am using a singularity container where I have installed pytorch using the ROCm installation: pip3 install torch torchvision==0.11.1 -f https://download.pytorch.org/whl/rocm4.2/torch_stable.html and dgl using conda install -c dglteam dgl.

I was wondering if you have encountered this issue before and if you know how to solve it. Thanks!

VoVAllen · October 31, 2021, 12:49pm

Hi,

Sorry we don’t support AMD GPUs now, because all our gpu kernels are programmed with CUDA and its toolchain. For now you can only try cpu

paulamartingonzalez · November 1, 2021, 10:37am

Oh I see, that makes sense! Thanks for your reply @VoVAllen !

Do you plan to support AMD GPUs in the near future? I have access to an AMD HPC and I am happy to help debugging if yes. Apparently there’s this HIPIFY toolbox that helps translating CUDA into HIIP/ROCm in a semi-automatic manner: GitHub - ROCm-Developer-Tools/HIPIFY: HIPIFY: Convert CUDA to Portable C++ Code

https://rocmdocs.amd.com/en/latest/Programming_Guides/HIP-porting-guide.html

system · December 1, 2021, 10:38am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.