Inductive Link Prediction on Heterogenous Graphs

Hello everyone, I have a question regarding a link prediction task . Currently, I have a dataset of multiple heterogenous graphs. I want to train a GNN model on a set of graphs, and perform link prediction for completely unseen new graphs in the test set, i.e. predict their adj matrices. Some information about the dataset:

  • 900 heterogenous graphs (1 node type, 10 edge types).
  • There could be multiple edges of different types (at most 1 per edge type) between any pair of nodes.
  • Graphs are very sparse, each edge type consists of only <1% of the total possible links between all pair of nodes per type. This would result in heavily imbalanced class problem if a model wants to output a complete adjacency matrix including all edge types for a new graph.

My questions are:

  1. Is there an example of training multiple heterogenous graphs for the link prediction task within the library? I have tried to look but did not find a suitable one for my use-case.
  2. There is an idea to merge all the graphs as individual disconnected graphs for a super-graph, then we can train the entire graph with the link prediction task. However, I do not know how to distribute the negative/positive sampling on the graph (e.g. equally on each disconnected part, or just randomly on the whole super-graph) and cannot figure out how to perform inference after training.

Could you give me some pointers to solve this problem? Thank you.

Hi @stefanschutz, I think you could try DGL Graphbolt module to write a customer negative sampler. 6.3 Training GNN for Link Prediction with Neighborhood Sampling — DGL 2.1 documentation

1 Like

Thank you. It seems to be a very good direction to use this module. I will keep you updated.