When I run a DGL distributed training job, each machine will send/receive data from other machine. I want to measure the exact amount of data ---- how many bytes of data does a machine need to send/receive?
However, I haven’t found which part of the DGL code do the push and pull during the distributed training process. I had tried KVserver. I added some print() to distributed/kvstore.py and dis_kvstore.py, none of these print() work.
The comments say that “For now, KVServer can only support CPU-to-CPU communication”, and I used GPU to run the distributed training. Is using GPU the reason why I didn’t get any output?
Should I try RPC_server?