Why DGL is fast than the Pytorch or MXNet?

I notice you compared DGL with the best open-source implementations and DGL is faster. Can you explain why?

In my opinion, DGL ignores the multiplication of vertices who are not adjacent. In other words, DGL simplifies the matrix multiplication and speeds up than Pytorch?

At the current stage, we cannot. DGL is built atop of pytorch/mxnet, which means all the computation will be translated eventually to the operations in pytorch/mxnet. That’s why if you look at the GCN speedup, we are almost the same as the author’s implementation as we both use sparse-dense matmult to implement it.

As an analogy, you can think of DGL as a C++ program and pytorch/mxnet as an assembly program. Can C++ program be faster than assembly program? Every C++ trick can be technically written in assembly, but it is not easy. The merit of DGL is that you don’t need to worry about these tricks. You use high-level message passing APIs to implement the model and we will decide what optimizations to be used.

In the future, we are likely to implement our own kernels for some important operations (such as generalized sparse-dense matmult). This might further distinguish us from the backend frameworks.

@ minjie
Sorry for my wrong expression. I know DGL is based on the existing deep-learning framework like Pytorch.
I want to know why GCN based on DGL is faster than GCN based on Pytorch(https://github.com/tkipf/pygcn). What does DGL optimize, can you tell me the details?
On the other hand, what is the main motivation of DGL? convenient for us to write Graph Models or speeds up the execution?

What are the tricks, could you elaborate more?

What are the advantages of high-level message passing APIs ?

The advantage of high-level message passing APIs is to provide unified programming interface for graph programming and gives the DGL framework the opportunities to optimize its speed. So model developers don’t need to worry too much about how to write a more efficient implementation for their model.

The current trick we have is to map a user-defined computation (currently, it has to be expressed as built-in functions) to a sparse matrix operation provided by the framework. You might have seen it in the GCN example. In addition, DGL implements auto batching to speed up computation if users write their models with pure user-defined functions (UDF).

In a near future, we want to map the UDFs to a more efficient form of computation automatically. This kind of transformation happens under the hood. Users don’t need to worry about how it happens.