I will describe my problem and hope for any suggestion from the community on how to proceed. I will also make some questions.

When fabricating circuits, they begin as simple graphs (logic synthesis), and many steps afterward, they are placed on the chip, and it is only then that we know how congested their region is. I am trying to predict the congestion they will have with data available from logic synthesis. I understand my problem is a node regression one. I am generating pairs of CSVs (nodes and egdes) for each circuit available as input for DGL.

Currently I have only 2 features, I am considering expanding the number of features. Since more features tend to improve learning, correct? For example 1 features is the total number of connections of a node, this could be separated into number of inputs and outputs.

The code I currently have separates the train, valid and test sets of nodes within a single graph. What if I wish to train with 60% of all the graphs available and leave 40% for valid/test? How can I do that, is there any example on the documentation?