Inductive Link Prediction on Heterogenous Graphs

stefanschutz · January 2, 2024, 12:33am

Hello everyone, I have a question regarding a link prediction task . Currently, I have a dataset of multiple heterogenous graphs. I want to train a GNN model on a set of graphs, and perform link prediction for completely unseen new graphs in the test set, i.e. predict their adj matrices. Some information about the dataset:

900 heterogenous graphs (1 node type, 10 edge types).
There could be multiple edges of different types (at most 1 per edge type) between any pair of nodes.
Graphs are very sparse, each edge type consists of only <1% of the total possible links between all pair of nodes per type. This would result in heavily imbalanced class problem if a model wants to output a complete adjacency matrix including all edge types for a new graph.

My questions are:

Is there an example of training multiple heterogenous graphs for the link prediction task within the library? I have tried to look but did not find a suitable one for my use-case.
There is an idea to merge all the graphs as individual disconnected graphs for a super-graph, then we can train the entire graph with the link prediction task. However, I do not know how to distribute the negative/positive sampling on the graph (e.g. equally on each disconnected part, or just randomly on the whole super-graph) and cannot figure out how to perform inference after training.

Could you give me some pointers to solve this problem? Thank you.

czkkkkkk · January 18, 2024, 1:39am

Hi @stefanschutz, I think you could try DGL Graphbolt module to write a customer negative sampler. 6.3 Training GNN for Link Prediction with Neighborhood Sampling — DGL 2.1 documentation

stefanschutz · February 5, 2024, 8:57am

Thank you. It seems to be a very good direction to use this module. I will keep you updated.

system · March 6, 2024, 8:58am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.