Link prediction only in subgraph / Training a GNN on various graphs

Hi everyone!

I have a doubt about how I should process my dataset before training a GNN on it.

I have 300 texts and I aim to turn each one of them into a graph. They are tagger with atributes and relations from each text. No problem there.

The things is later i want to predict some relations in new graphs some missings relations.

My doubt here is which is the right (and possible) approach:

  1. should I create a graph from every text? could I train the GNN on all of them in order to later predict on relations on new graphs?

  2. or should I create one big graph in which every text is, lets say, a subgraph (or partiton)? In this case, if i add a new subgraph to the big graph wich contains all graphs from training, could I predict links JUST in this new subgraph?

Are these approches valid? If so, any of them is better than the other?

Thank you so much,

Óscar.

1 Like

They are equivalent. In both cases you will train a two-stage model, which will first update node representations with a GNN and then score pairs of nodes for link prediction. After you have a trained model, you can apply it to arbitrary new graphs to predict missing links.

1 Like