Node Features in GNN Link Prediction

Hi there, I am currently working on a Link Prediction project using GNNs. So far I have looked at the following models: GCN, GraphSAGE, GAT, GIN, PGNN. From my observations, the authors of the mentioned models usually perform Link Prediction on datasets that have explicit node features (eg. TF-IDF vectors in Cora, PubMed etc.) These models also seem to accept features like node degrees, or centrality measures, however, the results are usually not promising (around 70% Test AUC finally). The datasets that I have gathered for my project are present without any node features, and my question is whether it is really necessary for these models to use datasets with explicit node features (like Cora) or there is a way how to reach high performance without them (handcrafting own node features).


In general, the performance of graph neural networks can heavily depend on the quality of node features. Nevertheless, it’s still possible to handcraft the following node features from scratch:

  • topological features like node degrees
  • node embeddings learned by network embedding approaches, e.g. deepwalk, node2vec, LINE.