Increase link numbers in a graph

Hello, I’ve trained a link prediction model using dgl, I am wondering how to use this model to increase the link numbers of my graph.

Do you want to obtain node embeddings from trained model and link nodes in graph according to embeddings which has high value of dot product?

As mentioned in 5.3 Link Prediction — DGL 0.8 documentation, node embeddings could be obtained. There are multiple ways of using the node embeddings. Examples include training downstream classifiers, or doing nearest neighbor search or maximum inner product search for relevant entity recommendation.

Yep. I’ve already obtained a link prediction model, I use dot product score to minimize the loss. But I am confused how I can actually use this model to increase the link numbers in my graph

then did you try to obtain node embedding like the link I showed: node_embeddings = model.sage(graph, node_features), then link nodes which has high dot product on embedded feature? is this what you want: increase link numbers in graph? or could you explain more on increase link numbers?

So basically I have a link prediction model and want to use it to predict if there will be links between those unconnected nodes. But I don’t know how to apply my model to iterate those unconnected nodes, do I need a dataloader or something?

once obtain the node embeddings, go through all node pairs directly and compute dot products, then link them if the value is greater than a threshold. there exist N*(N-1)/2 pairs.

Yep, that’s my problem. I am confused how to go through all node pairs, could you please give me an example showing how to do this?

below one is not efficient.
node_embeddings = model.sage(graph, node_features) N = graph.num_nodes() u=[], v=[] for i in range(N): for j in range(i+1, N): value = dot_product(node_embeddings[i], node_embeddings[j) if value > threshold: u.append(i) v.append(j) graph.add_edges(u,v)

Cool, I got the idea. I am still wondering if there is a efficient way to do this cause my graph is pretty huge. Btw, I followed Stochastic Training of GNNs tutorial to train my model, how to obtain node embeddings in this case?

model.gcn(graph, node_features)?

as for the efficiency on large graph, how about adding edges first, then calling graph.apply_edges(dgl.function.u_dot_v('x', 'x', 'score')) to compute dot products on edges. then remove edges which has low dot product. as N(num_nodes) is very large, maybe sample some of them to measure whether link exists is an option, not go through all possible pairs~N^2)?

emmm…yep, that sounds compromising, but I am worrying about the accuracy if we do this way