@Rhett-Ying Thank you for your reply.
As far as my understanding goes, accessing the elements of DistTensor
causes a remote request if the data is not present in the local machine. This is what I want to avoid.
I have features and attributes for some nodes that are not in the original graph. Also, the nodes are not in sequence and not contiguous e.g. nids={1, 7, 5678, 98476, ...}
etc. I have separate data for each partition which needs to be stored in each machine. I cannot store this data in the graph structure during partitioning as that will defeat the objective I am trying to achieve.
Now, I can store this data in a separate class and access it that way, but it involves extra overhead of memory copy which I want to avoid. Also, since the nodes for which I have extra data are not in sequence, I choose the dictionary structure where the key is node id
and the value is feature vector and other attributes. That way, I can access them by keys. As a workaround, I can maintain two tensors, one for storing the nids
and the other for storing the features and attributes and then get the indices of the required node from the first tensor and get the data of those indices from the second tensor, however it seems like a lot of overhead especially for large graphs such as ogb-paper100M
. That is the reason for choosing the dictionary structure.
I want to store this dictionary in the shared memory local to each machine to avoid extra overhead of memory consumptions/copy/transfer as well as remote requests, that’s why i cannot use DistTensor
.
Is there a way to store this dictionary in the shared memory of each machine with a name e.g. ext_data
and then access it during training?
If I store that data in g.ndata
as you suggested, will that be stored in the local shared memory of each machine? And then how to access it in the pull
function where the graph object is not available? My objective is to inter-mingle this data with the node features that are pulled in the pull
function for training.
P.S. the nodes for which I have data are not a fixed number which means that the number of nodes in the dictionary grow and shrink depending on the logic during training.
Thank you for spending time to read the lengthy question.