How to Get Different Splits for Cross-Validation


Could you please tell me how to conduct cross-validation with different splits?

I noticed in the GCN paper that the authors conducted a 5-fold cross-validation experiment with 5 different train/test split.

However, the train/validation/test split provided by DGL seems to be fixed. So do we need to split the datasets manually if we want to conduct the cross-validation, or can this be simply achieved by DGL?

If we need to manually achieve this, could you please tell me how the datasets should be split? I did not find any detail in the paper, eg. how many train/test points in total, how many train/test points per category, etc.

Thank you very much!

Currently we are using binary masks to represent the membership of nodes for the training/validation/test set. See L30, dgl GCN example.

For details about how cross validation was performed, I recommend emailing Thomas directly.

Hi @mufeili. I noticed that the train/test masks used in the GCN tutorial are contiguous. Do you think this affects model performance in any way? I am thinking in a multi-class situation, it may be impossible to have contiguous node masks for all node classes (or at least enough to cancel imbalance). What do you think? I look forward to hearing from you. thanks!

Hi, whether the masks are contiguous will not have an effect as this only corresponds to the indexing of nodes.

1 Like