Hi there,
I am working on a project where I am working on a recommendation system using a GNN to do node classification and later link prediction.
I have created a synthetic dataset (2 node types - students and tasks) where I want to store information in the nodes (node features) and also in the edges (edge features - trials of tasks done by a student - multidimesnional edge feature). I have not found a way yet where I can replicate a logic without storing the information of the edges also in the nodes. So I ask if anyone knows how I can build a GNN with using the edge information also in the decision or if I need to change my working style?
I found layers like EGATConv or EdgeConv - but they only use the edge information to know the attention - so the information is only used indirectly. I feel I need to embed the edge information into the node feature somehow (by also knowing what task the person was doing), cause i could not find a layer which is in power of using he datastructure i would like to use.
Here is some more information about the project and the structure of the data:
I want to use a GNN as a recommendation system to support education. There are methods like knowledge tracing, and I want to do something similar. However, before I do that, I first need to set up a GNN model that is capable of replicating node classification logic.
Here is the setting:
- It is a synthetic dataset that uses logic to create a label for the nodes - I want to replicate this logic through the GNN.
- There are task nodes and trainee nodes.
- There is a feature matrix with two columns for all nodes: The first column is 1 if the node is a person, otherwise 0. The second column is 1 if the node is a task, otherwise 0.
- Some trainees are connected to tasks by edges (if a trainee completes a task, they are connected).
- Which nodes are connected is stored in the edge index tensor.
- The edges have two types of features that contain two pieces of information. This information is stored in the edge feature tensor:
- First information: Grade between 1-5, normalized between 0 and 1 (0, 0.25, 0.5, 0.75, 1).
- Second information: If the task was the last task the trainee did, then 1, otherwise 0.
- Each person node has a label to be predicted by the GNN through node classification.
- The labels are the same as the last task they did. So if the person had three edges and the last edge (where the second piece of information is 1) was a 0.5, the label is 0.5.
As you can see, the logic is very simple. I started with much more complex graph data, but could never reach a point where my model ran without storing grades also in the trainee node but i would like to store it in the edge feature.
Overview dataset: Screenshot 2024-07-22 at 11.06.11|690x333