How to measure the exact amount of data sent/recevived by a machine during distributed training process?

When I run a DGL distributed training job, each machine will send/receive data from other machine. I want to measure the exact amount of data ---- how many bytes of data does a machine need to send/receive?

However, I haven’t found which part of the DGL code do the push and pull during the distributed training process. I had tried KVserver. I added some print() to distributed/ and, none of these print() work.

The comments say that “For now, KVServer can only support CPU-to-CPU communication”, and I used GPU to run the distributed training. Is using GPU the reason why I didn’t get any output?

Should I try RPC_server?


Currently there’s no way to count the bytes send/recv. Add print to dgl/ at 0b3a6216f57891d5b34e4d5d1318128829580fc1 · dmlc/dgl · GitHub should work. The other is deprecated.

Using GPU won’t affect the whole process, since the communication still happens on CPU and copied to GPU later

