I used my pc to do some evaluation on the two framework, it has a Nvidia 1660Ti GPU with 6GB memory。I used synthetic graph with density fixed(0.0008)，and I write a two-layer GAT for PyG and DGL. It turn out DGL can train a graph of 230000 nodes while PyG only 40000. So can someone tell me why there is such a huge gap between the two frame work. I only found a blog about (fused_kernel), but i can’t find other resource and the blog is written in 2019. So can someone give me some guidance

Hi, you are hitting some of the black magic offered by DGL . Joking aside, the fundamental reason is indeed kernel fusion, which avoids storing intermediate values of size O(|E|). You could check out our paper [1909.01315] Deep Graph Library: A Graph-Centric, Highly-Performant Package for Graph Neural Networks which elucidates more details such as how we formulate message passing into two general kernel patterns called gSpMM and gSDDMM.