My use-case - Unsupervised node representation learning on an unlabelled homogenous graph using GraphSAGE
Graph Description - The nodes are split into train, val, and test. The graph is such that the train nodes subgraph is a k-NN graph, while the val and test nodes are connected to only the nearest train nodes.
I’m using this example to build off of - dgl/train_sampling_unsupervised.py at master · dmlc/dgl · GitHub
My train_dataloader
will use the subgraph g.subgraph(train_nid)
.
How do I calculate validation loss? As g.subgraph(val_nid)
won’t have any edges, so I cannot use that as my val_dataloader
. I need to sample edges connected from val_nid
and the neighbors will be from my train subgraph.
Q1. How do I use the g_sampling
parameter from EdgeDataLoader
for this to work? Or is there another way?
Q2. Does using one subgraph in EdgeDataLoader
and another subgraph in g_sampling
cause problems? Since, creating a subgraph will change all the node and edge ids.
After training on the train subgraph, I plan to use the model to run inference on the entire graph, as in this function and use the node embeddings for my downstream task.
Q3. Does this seem correct?
Thanks for the help!