DGL load_graphs being terribly slow compared to PYG, why?

Consider the case where you have 5 million graphs of around 20 nodes. For a “in memory” dataset scenario, having a list of dgl graphs that are being loaded using load_graphs('file.bin') takes around 15 minutes. However, pytorch_geometrics .pt file containing the same information takes 5 seconds to load.

Why is this the case?
Thank you

Hi @thegadfly thanks for providing this to us! I did some tests against same graphs over DGL and PYG respectively, the result shows that PYG is about 4 times faster
than DGL:

DGL

Graph number Size in disk(Mb) Load time(Seconds)
10W 105 42
100W 1047 422
500W 5235 2067

PYG

Graph number Size in disk(Mb) Load time(Seconds)
10W 105 9.72
100W 1057 95.53
500W 5246 473.26

But in one scenario they performs like you reported — source graphs contains many repeated graphs, in that case PYG has a higher compression ratio so occupy a little disk and has much faster load speed, E.G.

PYG with a vector of repreated graphs

Graph number Size in disk(Mb) Load time(Seconds)
10W 0.2 0.009
100W 2 0.084
500W 10 0.343

Above all, We will investigate more to see where the gap happens. And could you provide more information about your dateset to see if there contains many repeated graphs?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.