GraphSAGE question

aure_bnp · July 24, 2023, 9:43am

Hi dgl community,

I have a question on how graphSAGE works.

I am currently using it with the SAGEConv module from dgl library.
I have a graph of nodes with 3 different labels (0,1 and 2). From this three different labels, I only use two of them to calculate the loss, let say those labels are 0 and 1. Hence, I have a binary classification problem. The thing is, I need the embeddings of the third unused (label : 2) label to generates embeddings of the nodes which have one of the two others labels 0 or 1.

So, in dgl.dataloading.DataLoader I only put the nodes with labels 0 and 1. But to compute the output embedding of a node v (with labels 1 or 0) at iteration K = 2, for instance, I need the embedding of some node u (with label 0, 1 or 2). What I don’t know is how SAGEConv computes the embeddings of the nodes with labels 2 at the iteration K > 1, knowing that they are not included in the dataloader, to cumpute the embedding of node v ?

To summarize, I only train my SAGEConv model on the labels 0 and 1, but I need the label 2 to compute embeddings of the labels 0 and 1.

I hope my question is clear.

Thank you all !

peizhou001 · July 26, 2023, 10:01am

Hope I understand your question correctly! It seems you want to train on nodes with labels 0 and 1, but also obtain embeddings for nodes with label 2.

Firstly, all your seed nodes should have types 0 and 1 without including type 2. Secondly, you can generate random embeddings for nodes with label 2, and during the training process, these nodes can still participate in the training. This is because they have neighbors with labels 1 and 2, and the Backpropagation (BP) algorithm will propagate the error to these nodes for updating their embeddings. As a result, you can train the embeddings for nodes with label 2 as well.

aure_bnp · July 26, 2023, 3:17pm

My question is not easy to understand, sorry. But this is exactly the idea.

Here is an image of what I was talking about :

I want to compute the embedding of the node in green with K = 2 (so two layers Sage). For that, I need the embeddings of the grey node (surrounded by orange square) at step k= 0 and k = 1.
To generate those embeddings, I need to apply a forward propagation (so k=0, 1 and 2) of GraphSAGE on all nodes.
My question is : if in my node dataloader (which is then passed to the forward propagation, loss computation and backpropagation), I only put the nodes with labels 0 and 1, does the SageCONV module computes the embeddings of the grey nodes at step 1 ? Knowing that at step k = 0, the embeddings are initialized with its features. If not, what does it do when a grey node is sampled ? I mean, it must do it, otherwise the generated embedding will not be as the authors explained it.

peizhou001 · August 9, 2023, 9:32am

Yes, you are right. If grey nodes are sampled in k=0 or k=1, even they are not chosen as seed nodes, they are also computed and updated in the FW and BW.

system · September 8, 2023, 9:32am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.