GCMC multiple process multiple GPU -- question 2

iamjiang · April 28, 2020, 10:14pm

In the coding snippet 216 ~ 226 of /examples/pytorch/gcmc/train_sampling.py
I don’t know why the following function is required to implement multiple process training

def prepare_mp(g):
    """
    Explicitly materialize the CSR, CSC and COO representation of the given graph
    so that they could be shared via copy-on-write to sampler workers and GPU
    trainers.
    This is a workaround before full shared memory support on heterogeneous graphs.
    """
    for etype in g.canonical_etypes:
        g.in_degree(0, etype=etype)
        g.out_degree(0, etype=etype)
        g.find_edges([0], etype=etype)

minjie · April 29, 2020, 2:32am

Currently, a DGL graph maintains three sparse formats, CSR, CSC and COO, because they are suitable for different operators (e.g., CSR is good for finding out edges while CSC is good for in edges). Since creating multiple copies of a graph could be memory inefficient, we create them on-demand. In multi-processing training, we save the graph in shared memory visible to multiple processes. However, that means we need to materialize all different formats beforehand otherwise each process will create its own, causing duplication. That’s the reason for having this code as temporary workaround before we have a more thorough solution.

def prepare_mp(g):
    for etype in g.canonical_etypes:
        g.in_degree(0, etype=etype)  # materialize CSC
        g.out_degree(0, etype=etype)  # materialize CSR
        g.find_edges([0], etype=etype)   # materialize COO