How to predict the directional edges in a heterogeneous network?

cs001632 · April 11, 2021, 3:12pm

How to predict the directional edges in a heterogeneous network?
Our input data is a table of edges (potential edges) and a list of nodes. Grouptruth is a set of edges that are known to exist.
For example:
edgetable.csv (potential edges)
dst src attr
A B AtoB
A B BtoA
A C AtoC
B D BtoD

nodetable
type Attr
A type2 5
B type1 3
C type1 2
D type2 5

groundtruth
B D negative

Our purpose is twofold:

whether there is a relationship between two types of nodes
how to determine their direction?
We know that RGCN/Sage might do it, but we don’t know how to design train, test and valid datasets. Could you kindly recommend relevant ideas and tutorials?

BarclayII · April 12, 2021, 7:20am

You can treat relationship with different direction as different relationships (i.e. a relationship and another “reverse” relationship), and edges with different direction as different edges (i.e. the edge from u to v and another edge from v to u are different).

For link prediction on heterogeneous graph you can refer to dgl/link_predict.py at master · dmlc/dgl · GitHub.

cs001632 · April 12, 2021, 12:20pm

Thank you for your kind suggestion. However, we have not predefined train, test and valid classes for samples. What is your suggestion to setup those class?

BarclayII · April 13, 2021, 7:23am

The most common strategy is to split the edges uniformly into something like 8:1:1 ratio. Depending on your data, task and use case you may want specific strategies other than uniform splitting. I’m not sure what exactly your case is, so I can only give some examples:

If your use case is predicting future edges, you may need to split training/validation/test set according to timestamps.
If your use case is to predict connections to/from unseen nodes, then your edge split should be grouped by their incident nodes instead.
If your dataset is tiny, like a hundred edges or so, then cross-validation might be your best bet.

cs001632 · April 13, 2021, 7:43am

Your timely reply is helpful! Specifically, the goal in our case is to select the most essential interactions (edges) in edgetable.csv. We plan to calculate the weight of each edge based on GNN and ranking them. Might unsupervised graphSAGE be helpful (dgl/train_sampling_unsupervised.py at master · dmlc/dgl · GitHub)? However, since it is an unsupervised method, why did this code contain test and valid parts?

cs001632 · April 26, 2021, 1:05pm

We do not have enough known direction information for training. Might you kindly recommend a unsupervised method to generate direction?

BarclayII · April 28, 2021, 7:17am

If you have very few labels then you can adapt traditional unsupervised learning methods or semi-supervised learning methods in deep learning. The following might give you an idea of how to adapt existing approaches to your problem - it’s just an example, so I can’t guarantee that it must work for your case.

For instance, a typical strategy of dealing with few labels is to first learn a general representation of each data instance (e.g. using an autoencoder), and then train a small model (like a linear classifier) on the learned representations. To adapt, you can

First learn a general representation of each edge, which can be expressed as a function of learned general node representations and edge features.
There are multiple ways to learn general node representations, e.g. using GraphSAGE or Graph Autoencoders.
Once the general node representations are learned, you can then train a simple linear classifier that takes the incident node representations (and edge features if applicable) as inputs. You may also want to consider higher-order features like what you would do in feature engineering.

system · May 28, 2021, 7:17am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.