Implemented Caching Schemes in Distributed Training

PhilMurzynowski · December 22, 2021, 5:59am

Hello! For context I am working in the multi-CPU and multi-GPU setting for large graphs in which node features are partitioned across machines and mini-batches sent to GPUs for compute. I wanted to follow up on the second question in this post from a year ago about whether node features that have not been partitioned onto a machine can be additionally cached for a period of time, or are node features always disjoint across machines? The answer a year ago was no and I have not found anything in the code/roadmaps/docs saying that has changed or will change but would like to confirm. And perhaps more generally, if the system provides caching at any level that has a notable effect on performance (besides the k-hop neighborhoods of the partitioned graph structure on each machine which I believe is sometimes referred to as extra_cached_hops) it would be great if I could be pointed to some documentation or quick set of high level notes that I could reference while reading the codebase.

Thank you!

Rhett-Ying · December 23, 2021, 9:12am

Unfortunately, the answer is still NO.

Rhett-Ying · December 27, 2021, 7:40am

BTW, do you want to cache other info/data besides data like features of current graph’s nodes and neighbor nodes? namely, something extra_cached_hops cannot achieve?

PhilMurzynowski · December 29, 2021, 8:19pm

I was looking to cache features of nodes that had not been originally partitioned onto the machine. I believe extra_cached_hops only caches the graph structure, not features?

Rhett-Ying · January 10, 2022, 7:37am

Yes, you’re right. Caching features is an interesting feature and we need more time to discuss internally on how to achieve this. Could you help open a feature request in DGL’s github so that we could track this?

system · February 11, 2022, 10:58am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.