Speed for RGCN on CPU and GPU

For the original RGCN implementation the author claims that running on CPU is faster than GPU due to GPU for sparse operation is not essential. I am wondering for DGL’s implementation, will the speed be quite different between cpu and gpu when using RGCN for a large graph (nodes: 30000, edges:40000)? Thanks.

The RGCN paper was written when Deep Learning frameworks were not ready for sparse computations.

DGL have optimized commonly used sparse operations, training a RGCN on GPU is much faster than training RGCN on CPU.

See the table 3 in dgl design paper: https://arxiv.org/pdf/1909.01315.pdf. RGCN on GPU is 30-40x faster than on CPU for ogbn-proteins graph.

Thanks for your reply! I am also wondering is my graph relatively big as I got an out of memory issue when I training to train a rgcn model on 30,000 nodes and 40,000 edges on a 16G GPU. Thanks!

Note that there is a low_mem flag in our RelGraphConv module:

Set it to True and see if OOM issue still exists.

Thanks! I just wondering is this the right place for the rgcn example?

I think so, but by default the low_mem option is not activated.

When I activated the low_mem in the rgcn example, which gives me the error message "dgl/_ffi/_cython/./base.pxi", line 155, in dgl._ffi._cy3.core.CALL dgl._ffi.base.DGLError: [23:40:23] /home/coulombc/wheels_builder/tmp.30875/python-3.8/dgl/src/array/cuda/utils.cu:19: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal
I am using python==3.8, torch==1.7, cuda==10.2

If the low_mem is off, then it returns OOM error RuntimeError: CUDA out of memory. Tried to allocate 6.08 GiB (GPU 0; 15.78 GiB total capacity; 9.77 GiB already allocated; 4.61 GiB free; 9.92 GiB reserved in total by PyTorch)

Sorry but DGL has some compatibility problem with PyTorch 1.7.
A patch version would be released this week, before that you can try DGL nightly build version via:

pip install --pre dgl-cu102

Thanks! I downgrade torch to 1.6 and it works!