Creating a Dataset for Node Classification with Multiple Graphs

YellowAllice · May 2, 2024, 8:20pm

Hi,
I have some questions about loading data.
As shown in the diagram below, my model is for node classification, and categorizes nodes into two types.
My dataset contains multiple graphs, each with different structures and about ten thousands of nodes. Only the nodes have features; the edges and graphs do not.

In the DGL 2.1.x user guide, chapter 4.6, I saw that data can be loaded from CSV files. Which way should I use?

‘Dataset of a single graph with features and labels’: Combine all graph into a single graph
‘Dataset of multiple graphs’

image715×234 14.8 KB

Does anyone have better suggestions?

dyru · May 9, 2024, 1:34am

In your case, you want to learn node classification with labels on graphs in the training set but predict on unseen test graphs. You should use dataset of multiple graphs.

However, it is a little bit different from the settings in the user guide. The graphs.csv in your data doesn’t contain labels. The labels should be provided in nodes.csv instead. And you should provide the column of graph_id in nodes.csv and edges.csv to distinguish nodes and edges of different graphs.

During training, you will load mini-batches from the graphs in the training set and combine each batch to a single graph for learning.

system · June 8, 2024, 1:34am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.