GraphSAGE question. The train data and valid data have no intersection. Then how does the valid data get the embedding for downstream model?

guotong1988 · November 19, 2019, 7:26am

After read code from https://github.com/williamleif/GraphSAGE and DGL.

Based on my understanding, it seems that train nodes are [1,2,3,4,5] and valid nodes are [6,7]. We are training on [1,2,3,4,5], then how does [6,7] get the embedding for downstream model?

Thank you very much!

VoVAllen · November 19, 2019, 7:33am

It’s a semi-supervised setting. Topology of the validation nodes is known when training. Only their labels are masked during training.

guotong1988 · November 19, 2019, 7:40am

Thank you but I read from https://github.com/williamleif/GraphSAGE/blob/master/graphsage/minibatch.py (the figure) and find there is no intersection between the train set and valid set.

I know https://github.com/williamleif/GraphSAGE/issues/111

But it still too abstract to understand. Hope you could point out the code lines to understand it.

Thank you and thank you!

guotong1988 · November 20, 2019, 8:40am

Thank you in advance.

guotong1988 · November 20, 2019, 8:46am

From the code https://github.com/williamleif/GraphSAGE/blob/master/graphsage/models.py

It seems that no info is shared between training and predicting.

guotong1988 · November 21, 2019, 1:50am

I understand now.
The graph connections/topology info is somehow re-used for predict unseen nodes’ final embedding.
The init embedding is from Glove wordvec or something.