A question on distributed learning

wuchiz · May 19, 2021, 9:17am

I want to predict the energy of molecules with the coordinates. In this case, a molecule is posed as a single graph where atoms are vertices. But if the number of atoms in a molecule is too large to be processed in a single gpu, i want to ask whether i can split a single graph into several parts and update the embedding with message passing mechanism with dgl?

minjie · May 19, 2021, 12:33pm

How large are your graphs?

wuchiz · May 20, 2021, 1:16am

about 30000 atoms，and the memory cache of a single gpu is only 30G.

minjie · May 20, 2021, 8:57am

30K is not too large. For example, DGL can fit the PubMed graph (~20K nodes) with no problem. I wonder is your graph very dense? How many edges are there?

wuchiz · May 20, 2021, 12:47pm

about 300000 edges. In my understanding, even though PubMed graph is also very large, at every training setp only a small subgraph will be fed into the model with nodedataloader. However, in my case, I need to load the entire graph to do message passing beacause every atom’s embedding representation is necessary for the prediction of energy.

minjie · May 24, 2021, 5:30pm

DGL should be able to fit graph of this size in a 30G GPU. You don’t need distributed training for this. I have tried loading the entire ogbn-product graph (2.4M nodes, 61M edges) into a V100 GPU (16GB) and train a GraphSAGE. See the script here: dgl-0.5-benchmark/main_dgl_product_sage.py at master · dglai/dgl-0.5-benchmark · GitHub .

system · June 23, 2021, 5:31pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.