Is the gradient result of dgl.ops with max/min operation correct?

Hi, I have a small question here.

Take dgl.ops.copy_e_max as an example.

g = dgl.graph(([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2]))
x = th.ones(3, 2, requires_grad=True)
out = F.copy_u_max(g, x)
out = out.sum()
out.backward()
print(x.grad)
# tensor([[1., 1.],
#            [1., 1.],
 #           [1., 1.]])

We can see that in such situations like max(x1=3, x2=3) = 3, the gradient will only be added into one result like x1, or x2. But from my perspective, both x1 and x2 have contributions to the result, should both gain gradients. So I have this question. Hoping for your answer. Thanks.

No, we only count the contribution for one of the inputs.
Your consideration makes sense, I’m not sure if there is standard behavior.

PyTorch would count the contribution of equal values and average the gradient for both:

>>> x = th.tensor([3., 3., 0.], requires_grad=True)
>>> x.max().backward()
>>> x.grad
tensor([0.5000, 0.5000, 0.0000])

Ok, thank you. It seems different deep learning frameworks might have different behaviors. PaddlePaddle would count the contribution of equal values as 1, but without averaging the gradients.

Since max is not a continuous function, I think as long as the gradient is one of the subgradients either way should work fine mathematically - although I’m not sure whether the performance impact would be.

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.