GraphSAGE question. The train data and valid data have no intersection. Then how does the valid data get the embedding for downstream model?

After read code from https://github.com/williamleif/GraphSAGE and DGL.

Based on my understanding, it seems that train nodes are [1,2,3,4,5] and valid nodes are [6,7]. We are training on [1,2,3,4,5], then how does [6,7] get the embedding for downstream model?

Thank you very much!

It’s a semi-supervised setting. Topology of the validation nodes is known when training. Only their labels are masked during training.

Thank you but I read from https://github.com/williamleif/GraphSAGE/blob/master/graphsage/minibatch.py (the figure) and find there is no intersection between the train set and valid set.

I know https://github.com/williamleif/GraphSAGE/issues/111

But it still too abstract to understand. Hope you could point out the code lines to understand it.

Thank you and thank you!

Thank you in advance.

From the code https://github.com/williamleif/GraphSAGE/blob/master/graphsage/models.py

It seems that no info is shared between training and predicting.

I understand now.
The graph connections/topology info is somehow re-used for predict unseen nodes’ final embedding.
The init embedding is from Glove wordvec or something.