I wrote my own code to create training, validation, and testing splits from my canonical tuples after extracting them from a graph; however, during testing, I’m getting an error about some entities/relations not being recognized. I suspect this is because my test set includes some nodes that were not in the training set, which means I didn’t do a proper split.
The split I wrote created the train/valid/test sets like
all_ctups_len = len(all_ctups) #all_ctups is a list of all canonical tuples
train_split = int(all_ctups_len*train_pct) #train_pct is a float between 0 and 1
valid_split = int(train_split + all_ctups_len*valid_pct) #tvalid_pct is a float between 0 and 1
# Randomly shuffle canonical edges for sampling
random.shuffle(all_ctups)
# Get training set
train = all_ctups[:train_split]
valid = all_ctups[train_split:valid_split]
test = all_ctups[valid_split:]
Is there a dgl built-in method for properly splitting canonical tuples into train/valid/test sets that ensures all entities and relations that are in the validation and testing sets are also in the training set? If not, I can adjust my code accordingly. Just wanted to see if there’s an efficient built-in way I wasn’t missing first.
Thanks in advance! (And if anyone has code for how they do splits, I’d love to see it!)
Alex