Recently, Cluster GCN proposed to use the ad-hoc clustering algorithms to construct sampler of original graph and facilitate extremely large graph training. It uses substantial smaller overhead compared to VRGCN in terms of memory used in training when using deep GCN on large graphs. The subgraph and batching grammar of DGL make it easy to implement Cluster GCN in a clearer way so I have a version of mine there: click me.
It would be nice if it is merged to master. Please let me know if any question, and if I should start an issue relating to this. Also please note it is now only a tentative commit, any suggestion of styling and structure is appreciated.