Speed for RGCN on CPU and GPU

xdwang0726 · October 29, 2020, 5:45pm

For the original RGCN implementation the author claims that running on CPU is faster than GPU due to GPU for sparse operation is not essential. I am wondering for DGL’s implementation, will the speed be quite different between cpu and gpu when using RGCN for a large graph (nodes: 30000, edges:40000)? Thanks.

zihao · October 29, 2020, 6:32pm

The RGCN paper was written when Deep Learning frameworks were not ready for sparse computations.

DGL have optimized commonly used sparse operations, training a RGCN on GPU is much faster than training RGCN on CPU.

See the table 3 in dgl design paper: https://arxiv.org/pdf/1909.01315.pdf. RGCN on GPU is 30-40x faster than on CPU for ogbn-proteins graph.

xdwang0726 · October 30, 2020, 1:42pm

Thanks for your reply! I am also wondering is my graph relatively big as I got an out of memory issue when I training to train a rgcn model on 30,000 nodes and 40,000 edges on a 16G GPU. Thanks!

zihao · October 30, 2020, 7:24pm

Note that there is a low_mem flag in our RelGraphConv module:

Set it to True and see if OOM issue still exists.

xdwang0726 · November 4, 2020, 3:38pm

Thanks! I just wondering is this the right place for the rgcn example?

zihao · November 5, 2020, 8:23am

I think so, but by default the low_mem option is not activated.

xdwang0726 · November 9, 2020, 4:46am

When I activated the low_mem in the rgcn example, which gives me the error message "dgl/_ffi/_cython/./base.pxi", line 155, in dgl._ffi._cy3.core.CALL dgl._ffi.base.DGLError: [23:40:23] /home/coulombc/wheels_builder/tmp.30875/python-3.8/dgl/src/array/cuda/utils.cu:19: Check failed: e == cudaSuccess || e == cudaErrorCudartUnloading: CUDA: invalid device ordinal
I am using python==3.8, torch==1.7, cuda==10.2

If the low_mem is off, then it returns OOM error RuntimeError: CUDA out of memory. Tried to allocate 6.08 GiB (GPU 0; 15.78 GiB total capacity; 9.77 GiB already allocated; 4.61 GiB free; 9.92 GiB reserved in total by PyTorch)

zihao · November 9, 2020, 7:06am

Sorry but DGL has some compatibility problem with PyTorch 1.7.
A patch version would be released this week, before that you can try DGL nightly build version via:

pip install --pre dgl-cu102

xdwang0726 · November 9, 2020, 10:06am

Thanks! I downgrade torch to 1.6 and it works!

minjie · November 16, 2020, 7:41am