Cannot Find DGL C++ graphbolt library

I’m running a program and I’ve run into an error with DGL, everytime I run the program I get FileNotFoundError: Cannot find DGL C++ graphbolt library at /home/account/miniconda3/envs/ATOMRefine/lib/python3.10/site-packages/dgl/graphbolt/libgraphbolt_pytorch_#2.1.2.post301.so

I confirmed that in graphbolt directory there’s a libgraphbolt_pytorch_#2.1.2.so file

I am also seeing this bug. Cannot solve it even try different versions of torch, torch 2.3, torch 2.2.2

The root cause is the suffix of .post301. This is not expected. Are you conda install from conda-forge instead of pytorch channel? Could you try with pytorch channel.

As @Rhett-Ying has hinted, it’s not quite clear where you got your pytorch from. If you are using conda-forge, I suggest you also source dgl from conda-forge. Mixing conda-forge packages with pip packages can work, but also breaks frequently. Judging from the url, it also seems you are installing a cuda based wheel, but don’t have cuda installed.

I suggest you try

mamba create -n my-environment -c conda-forge dgl

(or use conda instead of mamba if you prefer).

If you do want to debug the issue further, you can check whether all the dependent libraries are found by running ldd /home/account/miniconda3/envs/ATOMRefine/lib/python3.10/site-packages/dgl/graphbolt/libgraphbolt_pytorch_#2.1.2.post301.so.

I have a hunch that it will show a cuda library that cannot be found.

2 Likes

I have been having the same and exact problem for months and I could’t find a way to solve it. I also downloaded the official docker image from nvidia and dgl “nvcr.io/nvidia/dgl:24.05-py3”, installed there torch and dgl using pip:

torch==2.3.1
torch-tensorrt @ file:///opt/pytorch/torch_tensorrt/dist/torch_tensorrt-2.4.0a0-cp310-cp310-linux_x86_64.whl#sha256=e62c367c26869d8358a50e20e72279d2f28abd0596194554736156639434cf4d
torchaudio==2.3.1
torchdata==0.7.1
torchvision==0.18.1
dgl @ file:///opt/dgl/dgl-source/python/dist/dgl-2.2.1-cp310-cp310-linux_x86_64.whl#sha256=034e429957903526b5fd890f74c80430b78663ad50afcb9056e239b3b9b35428

As soon as I import dgl I get the following error:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/local/lib/python3.10/dist-packages/dgl/__init__.py", line 16, in <module>
    from . import (
  File "/usr/local/lib/python3.10/dist-packages/dgl/dataloading/__init__.py", line 13, in <module>
    from .dataloader import *
  File "/usr/local/lib/python3.10/dist-packages/dgl/dataloading/dataloader.py", line 27, in <module>
    from ..distributed import DistGraph
  File "/usr/local/lib/python3.10/dist-packages/dgl/distributed/__init__.py", line 5, in <module>
    from .dist_graph import DistGraph, DistGraphServer, edge_split, node_split
  File "/usr/local/lib/python3.10/dist-packages/dgl/distributed/dist_graph.py", line 11, in <module>
    from .. import backend as F, graphbolt as gb, heterograph_index
  File "/usr/local/lib/python3.10/dist-packages/dgl/graphbolt/__init__.py", line 36, in <module>
    load_graphbolt()
  File "/usr/local/lib/python3.10/dist-packages/dgl/graphbolt/__init__.py", line 26, in load_graphbolt
    raise FileNotFoundError(
FileNotFoundError: Cannot find DGL C++ graphbolt library at /usr/local/lib/python3.10/dist-packages/dgl/graphbolt/libgraphbolt_pytorch_2.3.1.so

Please help me fix it. It’s unacceptable that even the docker image suggested to build container to work with dgl have this problem with graphbolt.

1 Like

torch 2.3.1 will be supported in the DGL release 2.3, which is going to be released in a few weeks. You should try an older container or an older torch version such as 2.3.0.

@bepisworld Sorry for your experience of using the container and graphbolt. The torch and graphbolt are ready in the container without pip installation. You should be able to run dgl with graphbolt directly.

Could you share if you can run the graphbolt example in /opt/dgl/dgl-source/examples/multigpus/graphbot/node_classifiation.py ?
You need to pip install torchmetrics to run this example though.

The container build the dgl and graphbolt from source instead of using pip installs. I would love to hear back from you.

I managed to make the container work, but now I have the necessity to build another container starting from a public.ecr.aws/lambda/python:3.9 base image and I ended up with the same problem as before.

Would you mind to share your Dockerfile to build from the aws python:3.9 container? I would want to check if I can repro your issue and root cause it.

Not to pile on, but I’m in the same boat here.
On MacOS (Sonoma)
Have tried to install through conda with various versions of python (3.10, 3.12), various versions of torch (2.2, 2.1). I have also tried to build from source (following the MacOs steps).
In all instances something dylib related doesn’t work.
When doing the conda install I get the graphbolt issue mentioned.
When trying to build from source I get;

RuntimeError: Cannot find the files.

List of candidates:

/Users/<ME>/Code/DGL_dev/dgl/python/dgl/libdgl.dylib

/Users/<ME>/Code/DGL_dev/dgl/build/libdgl.dylib

/Users/<ME>/Code/DGL_dev/dgl/build/Release/libdgl.dylib

/Users/<ME>/Code/DGL_dev/dgl/lib/libdgl.dylib

/Users/<ME>/Code/DGL_dev/libdgl.dylib

I’ve been trying to get this going for a few days now, but I lack the skills to progress it beyond this.

I found a way to resolve the issue: use a more recent base image such as public.ecr.aws/lambda/python:3.12.

Still, the installation of the DGL library on a production environment is a nightmare if we need to check every base image we want to use to make the library work.

1 Like

I also faced this issue several times and, after several hours, finally I solved it on my use case.

In my case, I rent a cloud machine, and selecting the following image

pytorch/pytorch_2.2.1-cuda12.1-cudnn8-devel
(Note that somehow a similar image pytorch/pytorch_2.2.1-cuda12.1-cudnn8-runtime still have the graphbolt problem)

After that just pip from http, and it works for me
!pip install dgl -f https://data.dgl.ai/wheels/torch-2.2/cu121/repo.html

(In my case using Pytorch 2.2.1 is the only version I am success, I haven’t succeed with 2.2.2/2.3.0 )

I always use pytorch images from here and never have a problem: PyTorch | NVIDIA NGC

Note: I always build from source during my development workflow though.

The issue is because of the mismatch between the version of the PyTorch in use and the PyTorch used for building GraphBolt. Currently, DGL requires the versions to match. The team is on fixing the problem by relaxing this constraint. We will release the fix in the upcoming 2.3. Please stay tuned.

1 Like