OSError: [Errno 99] Cannot assign requested address #2714

Hey,

I am using dgl==0.4.3
My server is initialized but my client can not connect to the server.

Note that they are running in the same machine.

Error message:
starting server…

KVServer listen at 0.0.0.0:8523

model init
loaded pretrained relation embeddings. dim: 768
Traceback (most recent call last):
File “/home/ktfp768/CoLAKE/pretrain/run_pretrain.py”, line 266, in
train()
File “/home/ktfp768/CoLAKE/pretrain/run_pretrain.py”, line 198, in train
cache_dir=PYTORCH_PRETRAINED_BERT_CACHE + ‘/dist_{}’.format(args.local_rank))
File “/home/ktfp768/test/lib/python3.6/site-packages/transformers/modeling_utils.py”, line 655, in from_pretrained
model = cls(config, *model_args, **model_kwargs)
File “…/pretrain/model.py”, line 20, in init
self.ent_embeddings = LargeEmbedding(ip_config, emb_name, ent_lr, num_ent)
File “…/pretrain/large_emb.py”, line 185, in init
self.client = EmbClient(server_namebook)
File “…/pretrain/large_emb.py”, line 99, in init
super().init(server_namebook, queue_size, net_type)
File “/home/ktfp768/test/lib/python3.6/site-packages/dgl/contrib/dis_kvstore.py”, line 588, in init
self._machine_id = self._get_local_machine_id()
File “/home/ktfp768/test/lib/python3.6/site-packages/dgl/contrib/dis_kvstore.py”, line 981, in _get_local_machine_id
if ip in self._local_ip4_addr_list():
File “/home/ktfp768/test/lib/python3.6/site-packages/dgl/contrib/dis_kvstore.py”, line 999, in _local_ip4_addr_list
struct.pack(‘256s’, name[:15].encode(“UTF-8”)))[20:24])
OSError: [Errno 99] Cannot assign requested address
Exception ignored in: <bound method KVClient.del of <pretrain.large_emb.EmbClient object at 0x7f8edc30c780>>
Traceback (most recent call last):
File “/home/ktfp768/test/lib/python3.6/site-packages/dgl/contrib/dis_kvstore.py”, line 604, in del
_finalize_sender(self._sender)
AttributeError: ‘EmbClient’ object has no attribute ‘_sender’
Exception ignored in: <bound method LargeEmbedding.del of LargeEmbedding()>
Traceback (most recent call last):
File “…/pretrain/large_emb.py”, line 204, in del
self.client.shut_down()
File “/home/ktfp768/test/lib/python3.6/site-packages/torch/nn/modules/module.py”, line 779, in getattr
type(self).name, name))
torch.nn.modules.module.ModuleAttributeError: ‘LargeEmbedding’ object has no attribute ‘client’
Traceback (most recent call last):
File “/opt/scp/software/Python/3.6.3-foss-2017a/lib/python3.6/runpy.py”, line 193, in _run_module_as_main
main”, mod_spec)
File “/opt/scp/software/Python/3.6.3-foss-2017a/lib/python3.6/runpy.py”, line 85, in _run_code
exec(code, run_globals)
File “/home/ktfp768/test/lib/python3.6/site-packages/torch/distributed/launch.py”, line 260, in
main()
File “/home/ktfp768/test/lib/python3.6/site-packages/torch/distributed/launch.py”, line 256, in main
cmd=cmd)
subprocess.CalledProcessError: Command ‘[’/home/ktfp768/test/bin/python’, ‘-u’, ‘/home/ktfp768/CoLAKE/pretrain/run_pretrain.py’, ‘–local_rank=0’, ‘–name’, ‘CoLAKE’, ‘–data_prop’, ‘1.0’, ‘–batch_size’, ‘2048’, ‘–lr’, ‘1e-4’, ‘–ent_lr’, ‘1e-4’, ‘–epoch’, ‘1’, ‘–grad_accumulation’, ‘16’, ‘–save_model’, ‘–emb_name’, ‘entity_emb’, ‘–n_negs’, ‘200’, ‘–beta’, ‘0.98’]’ returned non-zero exit status 1.

The error is OSError: [Errno 99] Cannot assign requested address.
Can you verify which ip address the client is trying to connect?

It is trying to connect to ip of local machine: 127.0.0.1 with an available port

Can the error due to: linux - Get IP address from python - Stack Overflow
I am not sure what is the network status of your colab machine.

This issue is resolved: AttributeError: 'EmbClient' object has no attribute 'set_partition_book' · Issue #7 · txsun1997/CoLAKE · GitHub.
One should modify _get_local_machine_id() in KVClient with DGL 0.4.3

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.