Implementing multidimensional edge features in node embedding

Simon_eck · July 22, 2024, 5:15pm

Hi there,

I am working on a project where I am working on a recommendation system using a GNN to do node classification and later link prediction.

I have created a synthetic dataset (2 node types - students and tasks) where I want to store information in the nodes (node features) and also in the edges (edge features - trials of tasks done by a student - multidimesnional edge feature). I have not found a way yet where I can replicate a logic without storing the information of the edges also in the nodes. So I ask if anyone knows how I can build a GNN with using the edge information also in the decision or if I need to change my working style?

I found layers like EGATConv or EdgeConv - but they only use the edge information to know the attention - so the information is only used indirectly. I feel I need to embed the edge information into the node feature somehow (by also knowing what task the person was doing), cause i could not find a layer which is in power of using he datastructure i would like to use.

Here is some more information about the project and the structure of the data:

I want to use a GNN as a recommendation system to support education. There are methods like knowledge tracing, and I want to do something similar. However, before I do that, I first need to set up a GNN model that is capable of replicating node classification logic.

Here is the setting:

It is a synthetic dataset that uses logic to create a label for the nodes - I want to replicate this logic through the GNN.
There are task nodes and trainee nodes.
- There is a feature matrix with two columns for all nodes: The first column is 1 if the node is a person, otherwise 0. The second column is 1 if the node is a task, otherwise 0.
Some trainees are connected to tasks by edges (if a trainee completes a task, they are connected).
- Which nodes are connected is stored in the edge index tensor.
- The edges have two types of features that contain two pieces of information. This information is stored in the edge feature tensor:
  - First information: Grade between 1-5, normalized between 0 and 1 (0, 0.25, 0.5, 0.75, 1).
  - Second information: If the task was the last task the trainee did, then 1, otherwise 0.
Each person node has a label to be predicted by the GNN through node classification.
- The labels are the same as the last task they did. So if the person had three edges and the last edge (where the second piece of information is 1) was a 0.5, the label is 0.5.

As you can see, the logic is very simple. I started with much more complex graph data, but could never reach a point where my model ran without storing grades also in the trainee node but i would like to store it in the edge feature.
Overview dataset: Screenshot 2024-07-22 at 11.06.11|690x333

Simon_eck · July 22, 2024, 8:14pm

I know that there are some layers which can handle multi_dimesional layers:
https://pytorch-geometric.readthedocs.io/en/latest/notes/cheatsheet.html
https://www.google.com/search?client=safari&sca_esv=2a19a3414e05e997&rls=en&q=multidimensional+edge+features&tbm=vid&source=lnms&fbs=AEQNm0COtQ6qE5snXClm_cWqGTLX_jMP5V4l2v9LemFtanifXUj1LD6QCINf2Stcfc55fHi_K0iAiH4y_ML3L3eGQg5PEfGyozrH18UyDG1K7hKluirqLXn3W4xbubFf-JlW-rLHwHKtknQStyHUHurmuREA-EcGQvGc1tkuTPadS97hRIQtzGEDv3xnqbV6_U7TsoNyXI0QvhsymvxWjRji2eJpE7C0mg&sa=X&ved=2ahUKEwjNh4fquruHAxVyAzQIHY6EBd8Q0pQJegQIDRAB&biw=1920&bih=1000&dpr=1#

But somehow putting that to my code doesn’t work. Does anyone has experience or examples of people doing something like that?

Simon_eck · July 23, 2024, 5:37pm

Also I found another way ( Peer-inspired Student Performance Prediction in Interactive Online Question Pools with Graph Neural Network) of handle edge features through putting in helping nodes - but that also brings complication in it and makes my dataset less structured so I would like to see if there is another way:
(helping node idea)

Simon_eck · July 24, 2024, 9:49pm

So far I can achieve this by storing grade information in the trainee nodes. That still doesn’t show much because all the information that is needed is already in the trainee node right now, so far I haven’t shown that there is any benefit from the information I have trough ne other nodes (message passing doesn’t bring information that is needed - because it is already in the node) - so next step for me is to try storing important information in these help nodes and see if the message passing is able to help me. If anyone is still great with this topic it would be great to hear what they think about this problem and if there is a better way. All the best

Simon_eck · July 24, 2024, 10:43pm

this goes in another direction working on the message passing_ Q&A Question 13

BarclayII · July 31, 2024, 2:59pm

It seems that you are asking about general advices on using multi-dimensional edge features in a GNN? DGL does not have a readily available module for that. However, you could do it with apply_edges function with a custom function:

def fn(self, edges):
    # Combine the source node feature and edge feature together into a new edge feature 'm'
    return {'m': edges.src['node_feature'] + edges.data['edge_feature']}

def forward(self, g, node_features, edge_features):
    with g.local_scope():
        # Assign the node feature to the source nodes of the graph.
        g.srcdata['node_feature'] = node_features
        # Assign the edge feature to the graph
        g.edata['edge_feature'] = edge_features
        # Fuse source node features onto edge features
        g.apply_edges(self.fn)
        # Do message passing by aggregating the new edge feature 'm' directly
        g.update_all(dgl.function.copy_e('m', 'm'), dgl.function.sum('m', 'h'))
        # Do something more for the returned destination node feature 'h'.  Here I just return it.
        return g.dstdata['h']

system · August 30, 2024, 2:59pm

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.