A question on distributed learning

I want to predict the energy of molecules with the coordinates. In this case, a molecule is posed as a single graph where atoms are vertices. But if the number of atoms in a molecule is too large to be processed in a single gpu, i want to ask whether i can split a single graph into several parts and update the embedding with message passing mechanism with dgl?

How large are your graphs?

about 30000 atoms,and the memory cache of a single gpu is only 30G.

30K is not too large. For example, DGL can fit the PubMed graph (~20K nodes) with no problem. I wonder is your graph very dense? How many edges are there?

about 300000 edges. In my understanding, even though PubMed graph is also very large, at every training setp only a small subgraph will be fed into the model with nodedataloader. However, in my case, I need to load the entire graph to do message passing beacause every atom’s embedding representation is necessary for the prediction of energy.

DGL should be able to fit graph of this size in a 30G GPU. You don’t need distributed training for this. I have tried loading the entire ogbn-product graph (2.4M nodes, 61M edges) into a V100 GPU (16GB) and train a GraphSAGE. See the script here: dgl-0.5-benchmark/main_dgl_product_sage.py at master · dglai/dgl-0.5-benchmark · GitHub .

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.