Hello everyone, I have a question regarding a link prediction task
. Currently, I have a dataset of multiple heterogenous graphs. I want to train a GNN model on a set of graphs, and perform link prediction for completely unseen new graphs
in the test set, i.e. predict their adj matrices. Some information about the dataset:
- 900 heterogenous graphs (1 node type, 10 edge types).
- There could be multiple edges of different types (at most 1 per edge type) between any pair of nodes.
- Graphs are very sparse, each edge type consists of only <1% of the total possible links between all pair of nodes per type. This would result in heavily imbalanced class problem if a model wants to output a complete adjacency matrix including all edge types for a new graph.
My questions are:
- Is there an example of training multiple heterogenous graphs for the link prediction task within the library? I have tried to look but did not find a suitable one for my use-case.
- There is an idea to merge all the graphs as individual disconnected graphs for a super-graph, then we can train the entire graph with the link prediction task. However, I do not know how to distribute the negative/positive sampling on the graph (e.g. equally on each disconnected part, or just randomly on the whole super-graph) and cannot figure out how to perform inference after training.
Could you give me some pointers to solve this problem? Thank you.