Predicting congestioned nodes before they are positioned

I will describe my problem and hope for any suggestion from the community on how to proceed. I will also make some questions.

When fabricating circuits, they begin as simple graphs (logic synthesis), and many steps afterward, they are placed on the chip, and it is only then that we know how congested their region is. I am trying to predict the congestion they will have with data available from logic synthesis. I understand my problem is a node regression one. I am generating pairs of CSVs (nodes and egdes) for each circuit available as input for DGL.

Currently I have only 2 features, I am considering expanding the number of features. Since more features tend to improve learning, correct? For example 1 features is the total number of connections of a node, this could be separated into number of inputs and outputs.

The code I currently have separates the train, valid and test sets of nodes within a single graph. What if I wish to train with 60% of all the graphs available and leave 40% for valid/test? How can I do that, is there any example on the documentation?

hi, @gudeh, feature number is not directly positively related with your result, it depends on your task.

About dataset splitting, you can use AsNodePredDataset.

Hi @peizhou001 thanks for you answer.

The documentation reads “Contains only one graph, accessible from dataset[0].”. Although my idea is to create my own dataset where I have multiple independent graphs representing different circuits. Should I use AsNodePredDataset?

It wouldn’t make sense to unite all circuits in a single graph.

I believe I have to do something similar to the example given in: dgl/train_ppi.py at master · dmlc/dgl · GitHub, as mentioned by @minjie in another closed discussion. Although the given example uses a standard dataset, what should I change when I have my own custom dataset? I suppose this should be easy.