Purpose behind distributed training

If there is already an option of running a graphsage model with data parallel, why is there a requirement for distributed training separately with graph partitioning?
src: https://github.com/dmlc/dgl/tree/master/examples/pytorch/graphsage/experimental

Hi, the purpose behind distributed training is that in many cases we cannot put the whole graph data into a single machine’s memory. For that case, we need to separate the graph into different partitions, each of which can be stored in a single machine’s memory.

1 Like

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.