DGL distributed doesn’t run when firewall is enabled on hosts even though I have opened port 30050 on all hosts. I also tried specifying port in ip_config.txt
file and opened that port in firewall along with port 30050 on all hosts but no joy:
192.168.1.100 10020
192.168.1.101 10020
...
Should the port in the ip_config.txt
file be enclosed in brackets like this?
192.168.1.100 [10020]
192.168.1.101 [10020]
...
The training works fine when I disable firewall on all hosts (without specifying port in ip_config.txt
), but it doesn’t work with firewall enabled.
Which ports need to be opened in firewall for distributed training?