Purpose behind distributed training

nihal · December 7, 2020, 5:43am

If there is already an option of running a graphsage model with data parallel, why is there a requirement for distributed training separately with graph partitioning?
src: https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphsage/experimental

mctt90 · December 7, 2020, 7:37am

Hi, the purpose behind distributed training is that in many cases we cannot put the whole graph data into a single machine’s memory. For that case, we need to separate the graph into different partitions, each of which can be stored in a single machine’s memory.

system · January 6, 2021, 7:37am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.