Edge_split consumes too much memory

According to this doc, I am using edge_split to create training dataset for distributed training.
I found that the split_even method called by edge_split is creating nonzero_1d tensor by calling

eles = F.nonzero_1d(F.tensor(elements))

Our graph has 13 billion edges for training, so the variable elements is a 13 billion length boolean list. F.tensor consumes 13billion*1byte~=12GB memory, F.nonzero_1d consumes 13billion*8byte~=96GB memory. So there is in total 108GB memory consumption which cannot be decreased by adding more training workers.

Does DGL has any plan for optimizing this part? If not, can I try to optimize and contribute it?

If you can contribute it, it will be great.
Can you give a general design of your solution?

1 Like

I have created a PR about this, we can discuss further there. Thank you.

1 Like