I am using the distributed training Python script as mentioned here. My understanding is that the line
train_nid = dgl.distributed.node_split(g.ndata['train_mask'])
gives the node IDs of all the training nodes in the subgraph. However, these IDs are not the same as the original node IDs of the original graph. I found this out when I concatenated the train IDs for each of the worker machines during training and compared them to the original train IDs created while partitioning, as shown below -
import dgl import torch as th from ogb.nodeproppred import DglNodePropPredDataset data = DglNodePropPredDataset(name='ogbn-arxiv') graph, labels = data labels = labels[:, 0] graph.ndata['labels'] = labels splitted_idx = data.get_idx_split() train_nid, val_nid, test_nid = splitted_idx['train'], splitted_idx['valid'], splitted_idx['test'] train_mask = th.zeros((graph.number_of_nodes(),), dtype=th.bool) train_mask[train_nid] = True val_mask = th.zeros((graph.number_of_nodes(),), dtype=th.bool) val_mask[val_nid] = True test_mask = th.zeros((graph.number_of_nodes(),), dtype=th.bool) test_mask[test_nid] = True graph.ndata['train_mask'] = train_mask graph.ndata['val_mask'] = val_mask graph.ndata['test_mask'] = test_mask dgl.distributed.partition_graph(graph, graph_name='ogbn-arxiv', num_parts=4, out_path='arxiv-4', balance_ntypes=graph.ndata['train_mask'], balance_edges=True)
Is there a way to get the original train IDs as well in the distributed training? My end goal is to see if the train IDs on each worker machine add up to the original train ID tensor.