Hi,
I am experiencing a bit of a mystery: when training my model (it is actually an RL agent but I don’t think that’s important for this question) I get deterministic results - with seeded runs - except when I use a Global Attention Pooling layer. I have checked that numpy, python, torch and dgl are all seeded. The weights of my linear layers (the gate_nn) that calculate the attention weights are also always initialised in the same way.
Does anyone have a clue what could cause this?
My code, where I also experimented a bit with setting multiple heads, looks like this:
Initialization:
self.number_of_heads = 6
self.heads = nn.ModuleList()
for i in range(self.number_of_heads):
gate_net = nn.Linear(h_dim, 1)
nn.init.xavier_uniform_(gate_net.weight)
self.heads.append(GlobalAttentionPooling(gate_net))
My forward pass for that part of the model then looks like this:
output = self.heads[0](batched_graph,output_rgcn)
for i in range(self.number_of_heads-1):
output = th.cat((output,self.heads[i+1](batched_graph,output_rgcn)),1)
Where output_rgcn are the node features coming out of my R-GCN.
With only 1 head I get the same nondeterminism.
Kind regards,
Erik