I am experiencing a bit of a mystery: when training my model (it is actually an RL agent but I don’t think that’s important for this question) I get deterministic results - with seeded runs - except when I use a Global Attention Pooling layer. I have checked that numpy, python, torch and dgl are all seeded. The weights of my linear layers (the gate_nn) that calculate the attention weights are also always initialised in the same way.
Does anyone have a clue what could cause this?
My code, where I also experimented a bit with setting multiple heads, looks like this:
self.number_of_heads = 6 self.heads = nn.ModuleList() for i in range(self.number_of_heads): gate_net = nn.Linear(h_dim, 1) nn.init.xavier_uniform_(gate_net.weight) self.heads.append(GlobalAttentionPooling(gate_net))
My forward pass for that part of the model then looks like this:
output = self.heads(batched_graph,output_rgcn) for i in range(self.number_of_heads-1): output = th.cat((output,self.heads[i+1](batched_graph,output_rgcn)),1)
Where output_rgcn are the node features coming out of my R-GCN.
With only 1 head I get the same nondeterminism.