I am using Cora, Citeseer and PubMed datasets for my Knowledge Graph project using Graph Attention Network model.
The DGL library has specified the train, test and validation masks in the ndata dictionary of theses datasets.
I tried printing the size of train, vaild and test masks.
The size of train mask is smaller than the test and valid.
For eg. for cora original sizes were:
“train_samples”: 140,
“valid_samples”: 500,
“test_samples”: 1000,
after swapping training, test and valid mask:
“train_samples”: 1000,
“valid_samples”: 140,
“test_samples”: 500,
After swapping the accuracy of the model increased abruptly (almost 4-9% depending on the dataset we used)
In deep learning and data science, generally the size of training dataset should be larger than the size of test and validation datasets.
So how does DGL split the datasets? What is the purpose behind splitting it in unusual manner?