I’m trying to perform distributed training on an instance with 8GPUs. I have a graph with 8.5M nodes and 1.2B edges. I created 8 partitions
dgl.distributed.partition_graph(graph, 'graph_partition', 8, 'partitions/')
When I try to load the partitoned graph into memory, I get the following error
>>> g = dgl.distributed.DistGraph('graph_partition', part_config='partitions/graph_partition.json') Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/home/ubuntu/anaconda3/envs/tensorflow2_p36/lib/python3.6/site-packages/dgl/distributed/dist_graph.py", line 390, in __init__ 'The standalone mode can only work with the graph data with one partition' AssertionError: The standalone mode can only work with the graph data with one partition
Is distributed training supported for single instance with multiple GPUs?