Failed to read graph after partition

After I used metis to partition the papers100M, I found that I could not use the command

subg, node_feat, _, gpb, _, node_type, _ = dgl.distributed.load_partition(part_config, rank)

to read correctly, and the error message was

loading partitions
Traceback (most recent call last):
  File "graph2bin.py", line 197, in <module>
    subg, node_feat, node_type = readGraph(rank,dataPath,dataName)
  File "graph2bin.py", line 106, in readGraph
    subg, node_feat, _, gpb, _, node_type, _ = dgl.distributed.load_partition(part_config, rank)
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/distributed/partition.py", line 202, in load_partition
    graph = load_graphs(partition_path)[0][0]
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/data/graph_serialize.py", line 195, in load_graphs
    return load_graph_v2(filename, idx_list)
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/data/graph_serialize.py", line 207, in load_graph_v2
    return [gdata.get_graph() for gdata in heterograph_list], label_dict
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/data/graph_serialize.py", line 207, in <listcomp>
    return [gdata.get_graph() for gdata in heterograph_list], label_dict
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/data/heterograph_serialize.py", line 79, in get_graph
    return DGLGraph(gidx, ntype_names, etype_names, nframes, eframes)
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/heterograph.py", line 124, in __init__
    self._init(gidx, ntypes, etypes, node_frames, edge_frames)
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/heterograph.py", line 178, in _init
    self._canonical_etypes = make_canonical_etypes(
  File "/home/xxx/miniconda3/envs/graphtest/lib/python3.8/site-packages/dgl/heterograph.py", line 6370, in make_canonical_etypes
    raise DGLError(
dgl._ffi.base.DGLError: Length of edge type list must match the number of edges in the metagraph. 0 vs 1

Also, not every partition produces this error, so Iā€™m very confused, and I want to know how to solve the problem.

which DGL version are you using? could you share how do you do the partition? dgl.distributed.partition_graph()?

DGL version:1.1.0+cu113

The command for partitioning the graph is

dgl.distributed.partition_graph(g, 'ogb-papers100M', 16, 'data',
                                    part_method='metis',
                                    balance_ntypes=None,
                                    balance_edges=False,
                                    num_trainers_per_machine=1)

Are you partitioning and loading with same DGL version? Any error is thrown during partition?

Could you try to re-partition the dataset? Or try with smaller num_parts?

Hello, today I performed repartitioning training and repartitioning reading in the same version environment. There were no errors during the partitioning process. However, when I attempted to read some partitions, some were read correctly while others resulted in errors, and the errors appear to be inconsistent.

error1

Traceback (most recent call last):
  File "trans.py", line 184, in <module>
    subg, node_feat, node_type = readGraph(rank, dataPath, dataName)
  File "trans.py", line 81, in readGraph
    subg, node_feat, _, gpb, _, node_type, _ = dgl.distributed.load_partition(part_config, rank)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/distributed/partition.py", line 209, in load_partition
    node_feats, edge_feats = load_partition_feats(part_config, part_id)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/distributed/partition.py", line 245, in load_partition_feats
    node_feats = load_tensors(relative_to_config(part_files['node_feats']))
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/data/tensor_serialize.py", line 68, in load_tensors
    tensor_dict[key] = F.zerocopy_from_dgl_ndarray(value)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/backend/pytorch/tensor.py", line 452, in zerocopy_from_dgl_ndarray
    if data.shape == (0,):
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/_ffi/ndarray.py", line 211, in shape
    for i in range(self.handle.contents.ndim)
AttributeError: 'NoneType' object has no attribute 'contents'
Segmentation fault

error2

Traceback (most recent call last):
  File "trans.py", line 184, in <module>
    subg, node_feat, node_type = readGraph(rank, dataPath, dataName)
loading partitions
Traceback (most recent call last):
  File "trans.py", line 184, in <module>
    subg, node_feat, node_type = readGraph(rank, dataPath, dataName)
  File "trans.py", line 81, in readGraph
    subg, node_feat, _, gpb, _, node_type, _ = dgl.distributed.load_partition(part_config, rank)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/distributed/partition.py", line 170, in load_partition
    graph = load_graphs(relative_to_config(part_files['part_graph']))[0][0]
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/data/graph_serialize.py", line 200, in load_graphs
    return load_graph_v2(filename, idx_list)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/data/graph_serialize.py", line 212, in load_graph_v2
    return [gdata.get_graph() for gdata in heterograph_list], label_dict
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/data/graph_serialize.py", line 212, in <listcomp>
    return [gdata.get_graph() for gdata in heterograph_list], label_dict
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/data/heterograph_serialize.py", line 78, in get_graph
    return DGLGraph(gidx, ntype_names, etype_names, nframes, eframes)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/heterograph.py", line 85, in __init__
    self._init(gidx, ntypes, etypes, node_frames, edge_frames)
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/heterograph.py", line 128, in _init
    self._canonical_etypes = make_canonical_etypes(
  File "/root/miniconda3/envs/DGL/lib/python3.8/site-packages/dgl/heterograph.py", line 5973, in make_canonical_etypes
    raise DGLError('Length of edge type list must match the number of '
dgl._ffi.base.DGLError: Length of edge type list must match the number of edges in the metagraph. 0 vs 1

The problem has been solved, it is caused by the internal problem of the cloud server, not caused by dgl calculation.

1 Like