How to Get Different Splits for Cross-Validation

Kqiii · January 10, 2020, 7:36am

Hi,

Could you please tell me how to conduct cross-validation with different splits?

I noticed in the GCN paper that the authors conducted a 5-fold cross-validation experiment with 5 different train/test split.

However, the train/validation/test split provided by DGL seems to be fixed. So do we need to split the datasets manually if we want to conduct the cross-validation, or can this be simply achieved by DGL?

If we need to manually achieve this, could you please tell me how the datasets should be split? I did not find any detail in the paper, eg. how many train/test points in total, how many train/test points per category, etc.

Thank you very much!

mufeili · January 10, 2020, 9:21am

Currently we are using binary masks to represent the membership of nodes for the training/validation/test set. See L30, dgl GCN example.

For details about how cross validation was performed, I recommend emailing Thomas directly.

aigo500 · February 5, 2020, 12:13pm

Hi @mufeili. I noticed that the train/test masks used in the GCN tutorial are contiguous. Do you think this affects model performance in any way? I am thinking in a multi-class situation, it may be impossible to have contiguous node masks for all node classes (or at least enough to cancel imbalance). What do you think? I look forward to hearing from you. thanks!

mufeili · February 6, 2020, 3:18am

Hi, whether the masks are contiguous will not have an effect as this only corresponds to the indexing of nodes.