I am trying to train the unsupervised example of GraphSage on my Graph:
70 Million nodes
5.2 Billion edges (2.6 to be exact duplicated to represent undirected).
If I initiate with 2 GPU and 25K batch size - training seems to start but super slow and will never finish.
So I tried increasing to 8 GPU - but when doing so training doesn’t even begin - it gets stuck on the EdgeDataLoader enumeration.
Maybe something to do with CPU memory?
Appreciate if someone has some guidance.