Partition Graph with 0 HALO vertices

pranjaln · May 26, 2023, 11:35am

I need to create partitioned DGL graphs for a personal project. However, I do not need the HALO vertices included in the partitions. I am trying to partition the Products graph with num_hops=0. The script I am using for this is given here -

import dgl
import torch as th
from ogb.nodeproppred import DglNodePropPredDataset
data = DglNodePropPredDataset(name='ogbn-products')
graph, labels = data[0]
labels = labels[:, 0]
graph.ndata['labels'] = labels

splitted_idx = data.get_idx_split()
train_nid, val_nid, test_nid = splitted_idx['train'], splitted_idx['valid'], splitted_idx['test']
train_mask = th.zeros((graph.number_of_nodes(),), dtype=th.bool)
train_mask[train_nid] = True
val_mask = th.zeros((graph.number_of_nodes(),), dtype=th.bool)
val_mask[val_nid] = True
test_mask = th.zeros((graph.number_of_nodes(),), dtype=th.bool)
test_mask[test_nid] = True
graph.ndata['train_mask'] = train_mask
graph.ndata['val_mask'] = val_mask
graph.ndata['test_mask'] = test_mask

dgl.distributed.partition_graph(graph, graph_name='ogbn-products', num_parts=32, num_hops=0,
                                out_path='32part_data_products',
                                balance_ntypes=graph.ndata['train_mask'],
                                balance_edges=True)

However, I encounter the following error -

Traceback (most recent call last):
  File "partition-graph.py", line 22, in <module>
    dgl.distributed.partition_graph(graph, graph_name='ogbn-products', num_parts=32, num_hops=0,
  File "/data/pranjaln/anaconda3/envs/venv/lib/python3.8/site-packages/dgl/distributed/partition.py", line 997, in partition_graph
    assert val[-1] == g.num_edges(etype)
AssertionError

Is there a way to go about this error with num_hops=0? Or with num_hops=1, can we load back only the independent partition without the HALO nodes? Any help would be highly appreciated.

Rhett-Ying · May 30, 2023, 8:50am

If no HALO is allowed for each partition, how could edges whose src does not belong to current partition be saved? For example, g has an edge 0 -> 1 and 1 is partitioned to part_0 while 0 is partitioned to part_1. Then where does the edge 0 -> 1 save to if HALO node is not allowed? If num_hops=1 is applied in this case, 0 -> 1 will be saved into part_0 as 1 is the inner node and 0 is the HALO node.

I think this is why you hit the assertion failure as some edges like above are missing.

pranjaln · June 5, 2023, 6:22am

@Rhett-Ying Thanks. I understand why the error is being raised now. The Products graph, however, is an undirected graph, and I would like to just ignore the cut edges. Is there a way around in the dgl partition_graph API to ignore the cut edges which, in turn, would take care of the num_hops=0 criteria that I have?

czkkkkkk · June 8, 2023, 2:54am

Hi @pranjaln

Currently dgl.partition_graph does not support num_hops=0. For an easy workaround, you can use dgl.subgraph (dgl.subgraph — DGL 1.1 documentation) to induce the subgraph of each partition without cutting edges.

Rhett-Ying · June 8, 2023, 3:11am

you could utilize inner_node and inner_edge fields of partitioned graphs(g.ndata/g.edata) to remove the HALO nodes/edges.

system · July 8, 2023, 3:12am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.