I’m utilizing GNN for link prediction in protein. For example, my current graph is a uni-directed connected graph starting from a beginning atom all the way to an ending atom. There is only one way to traverse the graph. You can think of it as a link list. The dataset would contain thousands of protein graph. I expect to train on multiple disconnected graphs then test the model on unseen graphs (only nodes, the GNN does the linking)
In this case, there are only (# nodes - 1) possible edges. However, the amount of negative edges are huge. I’m reading about EdgeSampler from https://docs.dgl.ai/en/0.4.x/api/python/sampler.html but still unsure the best settings for my case?
Does anyone have any suggestion for link prediction, especially strategies for negative sampling?