According to this doc, I am using edge_split to create training dataset for distributed training.
I found that the split_even method called by edge_split is creating nonzero_1d tensor by calling
eles = F.nonzero_1d(F.tensor(elements))
Our graph has 13 billion edges for training, so the variable elements is a 13 billion length boolean list. F.tensor consumes 13billion*1byte~=12GB memory, F.nonzero_1d consumes 13billion*8byte~=96GB memory. So there is in total 108GB memory consumption which cannot be decreased by adding more training workers.
Does DGL has any plan for optimizing this part? If not, can I try to optimize and contribute it?