Pages of error "IP Address not available for interface."

The whole distributed training works fine, but before the graph loading, I got lots of error on IP address.

What could be the cause? And if no easy solution, I am thinking of changing the code here (dgl/rpc_client.py at 195f99362d883f8b6d131b70a7868a537e55b786 · dmlc/dgl · GitHub), such that when this error happens many times, only print out error msg no more than once. Does this sound reasonable if I submit a PR for this?

Thanks.

It should only bind once I think. The message is because you have multiple NIC on your machine, and some NICs cannot be bind with the corresponding IP. Did you see this occur many times?

I see.

Yes, I am getting pages of this error message every time run the system.

Hi,

I made a PR [Fix] Use logging to print warning message in socket creation by VoVAllen · Pull Request #3098 · dmlc/dgl · GitHub to address this issue. You can use logging.getLogger("dgl-distributed-socket").setLevel(logging.WARNING+1) to disable the warnings after it merges. Does this fix your issue?

1 Like

Looks great! Thank you Allen! :grinning:

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.