Can I access DistTensor objects in subthreads?

Hi all.

I have a problem here: I start a thread for permanent receiving messages. And when received one, I tried to read a DistTensor in the thread, I got Segmentation fault error.

The code snippet is:

    def recv_thread_helper(self, stop):
        while True:
            meta = self.comm_handler.recv()
            if meta.rank == self.rank:
                # if receive metadata from itself, stop
                break
            else:
                srcrank = meta.rank
                tensor = meta.feat_idx
                self.featmap[srcrank] = tensor

                logger.debug("Show: {}".format(
                    self.g.ndata['features'][self.node_idx][0][tensor]))

            if stop():
                break

And the error message is:

Fatal Python error: Segmentation fault

Current thread 0x00007ff63d7ea700 (most recent call first):
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/site-packages/dgl/distributed/rpc.py", line 986 in fast_pull
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/site-packages/dgl/distributed/kvstore.py", line 1238 in pull
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/site-packages/dgl/distributed/dist_tensor.py", line 170 in __getitem__
  File "/data/glusterfs/home/xinchen/dgs_test/agent/agent.py", line 95 in recv_thread_helper
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/threading.py", line 870 in run
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/threading.py", line 890 in _bootstrap

Thread 0x00007ff63dfeb700 (most recent call first):
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/selectors.py", line 415 in select
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/socketserver.py", line 232 in serve_forever
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/threading.py", line 870 in run
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/threading.py", line 926 in _bootstrap_inner
  File "/home/xinchen/anaconda3/envs/py37/lib/python3.7/threading.py", line 890 in _bootstrap

Can anyone give me some advice please?

@VoVAllen Would you please give me some suggestions about it?

The DistTensor is not thread-safe now. May I ask what you are looking for now? Do you want to overlap the data transferring process with the main thread?

@VoVAllen Yes, exactly. I want to overlap the transfer process with the main thread.

Besides, for the data transfer part, as I only want to update the local nodes’ features, which seems stored in kvstore, is there any approach for me to access and modify it?

The __setitem__ should work, it calls the push function in kvstore. And it will modify the local part at

@VoVAllen Thanks! Explicitly using __setitem__ works for my case.

As the DistTensor is not thread-safe now, may I ask how can I work with this operation: I start a receiving thread to receive signals from other workers in a blocking way. On receiving a signal, how can I update the features of the local nodes in the sub receiving thread?

Can you move the data fetching process to the sampler process? Otherwise it’s hard to do so now, the basic idea would be using a separate thread in C++ and wrap it as a future for async pull

OK, thanks a lot for your help! I’ll consider moving it somewhere else while maintaining the performance.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.