Dist training orig_id question

hello,when I train 0.52 version, and I use api as follow:

ndata = g._get_all_ndata_names()

I can’t get orig_id

and when I use g.ndata[‘orig_id’] and key Error because I dont have this dist tensor

when I partition graph and reshuffle=True .

why ? thx for your attention.

how can I get orig_id during training

What do you mean about ‘origin_id’ ? Do you mean global ID? If so, you can g.ndata[NID] to get gloabl node ID and g.edata[EID] for global edge ID.

it mean origin_id before resuffle . I saw doc about
” However, the original IDs are still accessible through g.ndata['orig_id'] and g.edata['orig_id'] , where g is a DistGraph object (see the section of DistGraph).“

in

“By default, the partition API assigns new IDs to the nodes and edges in the input graph to help locate nodes/edges during distributed training/inference. After assigning IDs, the partition API shuffles all node data and edge data accordingly. During the training, users just use the new node/edge IDs. However, the original IDs are still accessible through g.ndata['orig_id'] and g.edata['orig_id'] , where g is a DistGraph object (see the section of DistGraph).”

why I use g.ndata[‘orig_id’] cant get my business IDs (origin_ids) IN DistGraph.NodeDataView ?

need help , IS it I understand a mistake ?

need help how can I get orig_id before resuffle

can anybody help can anybody help

This is potentially a bug. We will dive into detail about this issue.

@lixusign this is indeed a bug. However, ‘orig_id’ is still available in the partitioned graph. It’s just not exposed to the user in DistGraph. If you load the partition graph (something like data/part0/graph.dgl), you can still see ‘orig_id’ is in the node data and edge data.

So, I’m going to store the ‘orig_id’ to DistGraph while partition graph data.

and use DistTensor to store the orig_id ,and If it’s possible to do that ?

potentially, you can change these two functions:



so that the ‘orig_id’ tensors are shared to the client with shared memory. If you do so, you can get ‘orig_id’ from g._g.ndata[‘orig_id’], where g is a DistGraph object.

I’m curious how you like to use the ‘orig_id’ tensors? during the training? or during inference?

thx very much , I want to dump the emb in the pred layer. so I need to know between my orig node id and the NID relationships

if you don’t need the mapping during the training, you can load the partition graphs and construct the mapping yourself for now.

thx a lot , I will try your first method about load partitions alone too.

if the first option works, feel free to submit a PR to help us fix it :slight_smile:

:grinning: thx , IF I can

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.