Implemented Caching Schemes in Distributed Training

Hello! For context I am working in the multi-CPU and multi-GPU setting for large graphs in which node features are partitioned across machines and mini-batches sent to GPUs for compute. I wanted to follow up on the second question in this post from a year ago about whether node features that have not been partitioned onto a machine can be additionally cached for a period of time, or are node features always disjoint across machines? The answer a year ago was no and I have not found anything in the code/roadmaps/docs saying that has changed or will change but would like to confirm. And perhaps more generally, if the system provides caching at any level that has a notable effect on performance (besides the k-hop neighborhoods of the partitioned graph structure on each machine which I believe is sometimes referred to as extra_cached_hops) it would be great if I could be pointed to some documentation or quick set of high level notes that I could reference while reading the codebase.

Thank you!

Unfortunately, the answer is still NO.

BTW, do you want to cache other info/data besides data like features of current graph’s nodes and neighbor nodes? namely, something extra_cached_hops cannot achieve?

I was looking to cache features of nodes that had not been originally partitioned onto the machine. I believe extra_cached_hops only caches the graph structure, not features?

Yes, you’re right. Caching features is an interesting feature and we need more time to discuss internally on how to achieve this. Could you help open a feature request in DGL’s github so that we could track this?