Hi,
This question is similar to Calculate bytes sent and received with communication time per machine in distributed training, but I didn’t find a straightforward answer.
-
What is the best way to measure the time spent on
rpc
per minibatch per trainer? -
As far as my current understanding goes,
g.ndata["features"][input_nodes]
also pulls features of the local nodes so some time will also be spent on retrieving the local nodes’ features from the local KVstore. Is this overlapped withrpc
? -
How do I measure how many rpc calls were made per minibatch per trainer?
-
In this post When and how to fetch features on the remote machine, you mention -
In short, local ids will be converted to global ids and send request to target machines
However, doesn’t the input_nodes
returned by the dataloader for step, (input_nodes, seeds, blocks) in enumerate(dataloader)
already represent global_ids
?
Thank you for your time!