AttributeError: module 'dgl' has no attribute 'distributed' in branch 2.4

when using files in dgl/examples/pytorch/graphsage/dist, there is a problem

python3 partition_graph.py --dataset ogb-product --num_parts 2 --balance_train --balance_edges
load ogbn-products
This will download 1.38GB. Will you proceed? (y/N)
y
Using exist file products.zip
Extracting dataset/products.zip
Loading necessary files...
This might take a while.
Processing graphs...
100%|██████████████████████████████████████████████████| 1/1 [00:00<00:00,  2.06it/s]
Converting graphs into DGL objects...
100%|██████████████████████████████████████████████████| 1/1 [00:00<00:00,  9.39it/s]
Saving...
finish loading ogbn-products
finish constructing ogbn-products
load ogb-product takes 87.180 seconds
|V|=2449029, |E|=123718280
train: 196615, valid: 39323, test: 2213091
Traceback (most recent call last):
  File "/home/node/workspace/dgl240-project/examples/pytorch/graphsage/dist/partition_graph.py", line 87, in <module>
    dgl.distributed.partition_graph(
AttributeError: module 'dgl' has no attribute 'distributed'

This AttributeError occurs not only when partitioning, but also when distributed training. When I use the code from 2.3.1 for graph partitioning

the example you were using is deprecated. please use this one instead. And this issue is caused by dgl.distributed is not imported in default since DGL 2.4.

@Rhett-Ying Thank you for your reply

In 2.4, DGL/examples/distributed/graphsage/node_classification.py inference function seems to have changed:

1 The position of this statement has changed

y = dgl.distributed.DistTensor(
(g.num_nodes(), self.n_hidden),
th.float32,
"h",
persistent=True,
)

2 The content of the second loop has changed

for input_nodes, output_nodes, blocks in dataloader:
    ...

Has been changed to

            for input_nodes, output_nodes, blocks in tqdm.tqdm(dataloader):
                block = blocks[0].to(device)
                h = x[input_nodes].to(device)
                h_dst = h[: block.number_of_dst_nodes()]
                h = layer(block, (h, h_dst))
                if i != len(self.layers) - 1:
                    h = self.activation(h)
                    h = self.dropout(h)
                # Copy back to CPU as DistTensor requires data reside on CPU.
                y[output_nodes] = h.cpu()

Can you explain the advantages of these improvements?

Can we also use this inference method if we use other models such as GCN and GAT?

Please always refer to the latest example as it best adapt to the latest APIs of dataloader. I doubt the previous one can run well.

The results obtained with this method in two nodes:

Summary of node classification(GraphSAGE): GraphName ogbn-products | TrainEpochTime(mean) 9.9387 | TestAccuracy 0.7690
Client[1] in group[0] is exiting...
Part 0, Val Acc 0.8222, Test Acc 0.6115, time: 35.4167
Summary of node classification(GraphSAGE): GraphName ogbn-products | TrainEpochTime(mean) 9.7598 | TestAccuracy 0.6115
Client[0] in group[0] is exiting...

Can we add the two TestAccuracy and average them to get the final result? Can this also work for more nodes?

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.