Hi, I have a question regarding the sampling in RGCN sample code.
It looks like at each epoch we select a positive sample set of size “batch_size” and a negative sample set of size “batch_size * negative_rate”. There are two questions raised.
-
In each epoch we do this process once, which means in each epoch we train only one batch. Since batch_size < train_data.size, we are throwing away a lot of positive samples in each epoch (Of course because of random sampling in the next epochs we will use some of the other positive samples, but a) we cannot guarantee that we ever see all of them. b) still, each epoch contains only one batch of positive samples.)
-
Is there a reason to have an imbalanced training set? The size of negative samples given the default parameters is 10 times more than positive samples in each batch.
Thanks in advance for the answers.