I have used the DGL Metis partitioner script mentioned here (https://docs.dgl.ai/en/0.8.x/tutorials/dist/1_node_classification.html#sphx-glr-tutorials-dist-1-node-classification-py) to partition the papers-100M graph into 4 partitions. I want to create subgraphs from the existing partitions with halo hops as 0 (by default, it is 1 in the partitioning script). The following function is something I have come up with.
def create_subgraph(rank: int) -> dgl.DGLGraph:
"""
Creates a subgraph local to the worker consisting of only inner nodes
:param rank: The rank for which the subgraph needs to be returned
:return: dgl subgraph
"""
print(f"Loading the subgraph for - {rank}")
partition_tuple = dgl.distributed.load_partition(
part_config='papers-4/ogbn-papers100M.json',
part_id=rank,
load_feats=True
)
g = dgl.node_subgraph(
partition_tuple[0],
partition_tuple[0].ndata['inner_node'] == 1
)
g.ndata['feat'] = partition_tuple[1]['_N/feat']
g.ndata['label'] = partition_tuple[1]['_N/labels']
g.ndata['train_mask'] = partition_tuple[1]['_N/train_mask']
g.ndata['test_mask'] = partition_tuple[1]['_N/test_mask']
g.ndata['val_mask'] = partition_tuple[1]['_N/val_mask']
g = dgl.add_self_loop(g)
return g
Is this the right way to go, or am I missing something? rank is the part of the graph that needs to be loaded.