Dist training orig_id question

lixusign · November 23, 2020, 7:04am

hello，when I train 0.52 version, and I use api as follow：

ndata = g._get_all_ndata_names()

I can’t get orig_id

and when I use g.ndata[‘orig_id’] and key Error because I dont have this dist tensor

when I partition graph and reshuffle=True .

why ？ thx for your attention.

how can I get orig_id during training

mctt90 · November 23, 2020, 7:35am

What do you mean about ‘origin_id’ ? Do you mean global ID? If so, you can g.ndata[NID] to get gloabl node ID and g.edata[EID] for global edge ID.

lixusign · November 23, 2020, 7:47am

it mean origin_id before resuffle . I saw doc about
” However, the original IDs are still accessible through g.ndata['orig_id'] and g.edata['orig_id'] , where g is a DistGraph object (see the section of DistGraph).“

in

lixusign · November 23, 2020, 9:02am

“By default, the partition API assigns new IDs to the nodes and edges in the input graph to help locate nodes/edges during distributed training/inference. After assigning IDs, the partition API shuffles all node data and edge data accordingly. During the training, users just use the new node/edge IDs. However, the original IDs are still accessible through g.ndata['orig_id'] and g.edata['orig_id'] , where g is a DistGraph object (see the section of DistGraph).”

why I use g.ndata[‘orig_id’] cant get my business IDs (origin_ids) IN DistGraph.NodeDataView ?

lixusign · November 23, 2020, 10:51am

need help ， IS it I understand a mistake ？

lixusign · November 26, 2020, 2:52am

need help how can I get orig_id before resuffle

lixusign · November 27, 2020, 6:26am

can anybody help can anybody help

mctt90 · November 30, 2020, 7:28am

This is potentially a bug. We will dive into detail about this issue.

zhengda1936 · November 30, 2020, 8:01am

@lixusign this is indeed a bug. However, ‘orig_id’ is still available in the partitioned graph. It’s just not exposed to the user in DistGraph. If you load the partition graph (something like data/part0/graph.dgl), you can still see ‘orig_id’ is in the node data and edge data.

lixusign · November 30, 2020, 8:23am

So, I’m going to store the ‘orig_id’ to DistGraph while partition graph data.

and use DistTensor to store the orig_id ，and If it’s possible to do that ?

zhengda1936 · November 30, 2020, 4:30pm

potentially, you can change these two functions:

github.com

dmlc/dgl/blob/master/python/dgl/distributed/dist_graph.py#L60


    """
    def __init__(self, graph_name):
        self._graph_name = graph_name

    def __getstate__(self):
        return self._graph_name

    def __setstate__(self, state):
        self._graph_name = state

def _copy_graph_to_shared_mem(g, graph_name):
    new_g = g.shared_memory(graph_name, formats='csc')
    # We should share the node/edge data to the client explicitly instead of putting them
    # in the KVStore because some of the node/edge data may be duplicated.
    local_node_path = _get_ndata_path(graph_name, 'inner_node')
    new_g.ndata['inner_node'] = _to_shared_mem(g.ndata['inner_node'], local_node_path)
    local_edge_path = _get_edata_path(graph_name, 'inner_edge')
    new_g.edata['inner_edge'] = _to_shared_mem(g.edata['inner_edge'], local_edge_path)
    new_g.ndata[NID] = _to_shared_mem(g.ndata[NID], _get_ndata_path(graph_name, NID))
    new_g.edata[EID] = _to_shared_mem(g.edata[EID], _get_edata_path(graph_name, EID))
    return new_g

github.com

dmlc/dgl/blob/master/python/dgl/distributed/dist_graph.py#L111


    This is called by the DistGraph client to access the edge data in the DistGraph server
    with shared memory.
    '''
    shape = (g.number_of_edges(),)
    dtype = FIELD_DICT[name]
    dtype = DTYPE_DICT[dtype]
    data = empty_shared_mem(_get_edata_path(graph_name, name), False, shape, dtype)
    dlpack = data.to_dlpack()
    return F.zerocopy_from_dlpack(dlpack)

def _get_graph_from_shared_mem(graph_name):
    ''' Get the graph from the DistGraph server.

    The DistGraph server puts the graph structure of the local partition in the shared memory.
    The client can access the graph structure and some metadata on nodes and edges directly
    through shared memory to reduce the overhead of data access.
    '''
    g, ntypes, etypes = heterograph_index.create_heterograph_from_shared_memory(graph_name)
    if g is None:
        return None
    g = DGLHeteroGraph(g, ntypes, etypes)

so that the ‘orig_id’ tensors are shared to the client with shared memory. If you do so, you can get ‘orig_id’ from g._g.ndata[‘orig_id’], where g is a DistGraph object.

zhengda1936 · November 30, 2020, 4:38pm

I’m curious how you like to use the ‘orig_id’ tensors? during the training? or during inference?

lixusign · December 1, 2020, 2:19am

thx very much , I want to dump the emb in the pred layer. so I need to know between my orig node id and the NID relationships

zhengda1936 · December 1, 2020, 2:23am

if you don’t need the mapping during the training, you can load the partition graphs and construct the mapping yourself for now.

lixusign · December 1, 2020, 2:25am

thx a lot ， I will try your first method about load partitions alone too.

zhengda1936 · December 1, 2020, 2:30am

if the first option works, feel free to submit a PR to help us fix it

lixusign · December 1, 2020, 2:32am

thx , IF I can

system · December 31, 2020, 2:33am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.