Nondeterministic behaviour with GAP


I am experiencing a bit of a mystery: when training my model (it is actually an RL agent but I don’t think that’s important for this question) I get deterministic results - with seeded runs - except when I use a Global Attention Pooling layer. I have checked that numpy, python, torch and dgl are all seeded. The weights of my linear layers (the gate_nn) that calculate the attention weights are also always initialised in the same way.
Does anyone have a clue what could cause this?

My code, where I also experimented a bit with setting multiple heads, looks like this:

self.number_of_heads = 6 
        self.heads = nn.ModuleList()
        for i in range(self.number_of_heads):
          gate_net =  nn.Linear(h_dim, 1) 

My forward pass for that part of the model then looks like this:

output = self.heads[0](batched_graph,output_rgcn)
        for i in range(self.number_of_heads-1): 
          output =,self.heads[i+1](batched_graph,output_rgcn)),1)

Where output_rgcn are the node features coming out of my R-GCN.
With only 1 head I get the same nondeterminism.

Kind regards,



I tried with the GAP examples in official doc: NN Modules (PyTorch) — DGL 0.6.1 documentation and the output of GAP is deterministic.

The non-deterministic issue you hit is probably caused by something else, not dgl.nn.GlobalAttentionPooling.

could you try to simply your model and dig deeper to see if any more findings? If not, paste the complete but simplest code snippet that could reproduce this issue.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.