Why the order of edges in graph construction step affects the training results

Lintianqianjin · June 9, 2020, 9:06am

I met a strange problem, when I change the order of edges to construct the graph, I find the training results(e.g. loss of each step) will be different. I wanna know why. I can reproduct result when order of edges is fixed, thus I have control other reproducibility.

settings:
pytorch 1.5
cuda 10.1
dgl 0.4.3

model graphsage
aggregator pool

mufeili · June 10, 2020, 1:28pm

I assume you have fixed all sources of randomness like below?

np.random.seed(seed)
random.seed(seed)
torch.manual_seed(seed) # cpu
torch.cuda.manual_seed_all(seed)  # gpu
torch.backends.cudnn.deterministic = True  # consistent results on the cpu and gpu

In addition, DGL has some intrinsic randomness due to its backend kernels, see issue 1471

Lintianqianjin · June 11, 2020, 4:18am

yes, I fixed all randomness, I can reproduct all the result.
Anyway, I would like to know if the order of the edges and the order of the neighbors are fixed for each iteration?(I see no shuffle before lstm reducer in your graphsage code.)

mufeili · June 11, 2020, 4:20am

It should be, but note that you cannot make any assumption about edge order in the message passing phase, as internal optimization will be performed there.

Lintianqianjin · June 11, 2020, 4:25am

thanks, but I think I dont make it clear, I mean in graphsage, shuffling neighbors before feeding them into lstm makes lstm order-invariant, but in your graphsage demo code, you dont shuffle neighbors in lstm reducer. Thus I guess the edges order or neighbors order are at random in DGL for each iteration?