Split graph in train test based on connected components


I want to split my graph into connected components because I don’t want to break my network when training my model. There is a notion of time in those graphs. I want to point out that I want to do a node classification and not a graph classification.
Moreover, I would like that my training and testing set have approximately the same proportion of nodes with label -1, 0 and 1.
Is there any function in dgl which allows to do this or not ?

Thank you !

  1. I believe you can use the NetworkX to achieve the construction of connected components, and then utilize DGL’s interfaces to convert the graph structure.
  2. You can implement your requirements through custom operations, and use g.ndata to store; it doesn’t seem difficult.

Hi @BearBiscuit05 and thanks for your answer !

I had the same intuition for the first remark, i have already done it.
The second part is harder as you have to use a stratify parameter to get globally two subgraphs with the same proportions of labels.

If any of you have an idea on how to implement something like this, I will be happy to hear from you.

It is not that easy to implement in the end.

Thanks !

Actually I don’t quite understand your problem. Do you want to split a graph into three disconnected subgraphs and discard the edges across the subgraphs?

Not at all.

I have a complete graph which has multiple connected components. I want those connected components to go in a training set and a testing set. I also want them to have a similar percentage of nodes with label 0 and 1.
Have you any idea on how I could code this ?

I see. It is look like a Knapsack problem if you regard each connected graph as an object with some number of label 0 or 1. You want to divide the graphs into two sets and each set has the similar numbers of 0 and 1. DGL don’t provide this functionality. So you may need to find out a solution for this Knapsack problem.

Thanks a lot for your response !

I’ll definitely have a look into this knspsack problem.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.