GraphSAGE train_cv.py: Sampling Performance

Hello! I found sampling in train_cv,py (~22sec per epoch) is much slower than train_sampling.py (~2sec per epoch) on reddit dataset. In train_cv.py, generating history blocks will take extra time. In the following test, the sample_history is ~20sec per epoch.

def sample_blocks(self, seeds):
        seeds = th.LongTensor(seeds)
        blocks = []
        hist_blocks = []
        for fanout in self.fanouts:
            frontier = dgl.sampling.sample_neighbors(self.g, seeds, fanout)
            block = dgl.to_block(frontier, seeds)
            tic = time.time()
            hist_frontier = dgl.in_subgraph(self.g, seeds)
            hist_block = dgl.to_block(hist_frontier, seeds)
            toc = time.time()

            seeds = block.srcdata[dgl.NID]
            blocks.insert(0, block)
            hist_blocks.insert(0, hist_block)
            self.sample_history += toc - tic
        return blocks, hist_blocks
  • batch size: 6000
  • fan out: 2, 2
  • num_work: 0
  • others: default

Fan_out is 2, 2, and the copy in in_subgraph is lazy. Why these steps are so slow? I wonder how to generate hist_block more effectively. Thanks!

Given the same number of fanouts, VRGCN is expected to be slower than GCN with neighbor sampling because it additionally needs to aggregate the history of all neighbors. Such aggregation will require a much larger computation dependency for forward propagation, especially when your graph observes power-law and has some occasional high-degree nodes. The time consumption of in_subgraph and to_block is just an artifact of generating such a big computation dependency.

That being said, an alternative solution would be avoiding generating the history blocks altogether. Instead, one can store the “aggregation of history” tensor as a node feature, and update the “aggregation of history” tensor with in-place updates. I have yet to test the idea and see how the speed up is though.

Thanks for your help. I will try this method.