Is the gradient result of dgl.ops with max/min operation correct?

DesmonDay · May 23, 2022, 7:27am

Hi, I have a small question here.

Take dgl.ops.copy_e_max as an example.

g = dgl.graph(([0, 0, 0, 1, 1, 2], [0, 1, 2, 1, 2, 2]))
x = th.ones(3, 2, requires_grad=True)
out = F.copy_u_max(g, x)
out = out.sum()
out.backward()
print(x.grad)
# tensor([[1., 1.],
#            [1., 1.],
 #           [1., 1.]])

We can see that in such situations like max(x1=3, x2=3) = 3, the gradient will only be added into one result like x1, or x2. But from my perspective, both x1 and x2 have contributions to the result, should both gain gradients. So I have this question. Hoping for your answer. Thanks.

zihao · May 23, 2022, 11:02pm

No, we only count the contribution for one of the inputs.
Your consideration makes sense, I’m not sure if there is standard behavior.

PyTorch would count the contribution of equal values and average the gradient for both:

>>> x = th.tensor([3., 3., 0.], requires_grad=True)
>>> x.max().backward()
>>> x.grad
tensor([0.5000, 0.5000, 0.0000])

DesmonDay · May 24, 2022, 2:35am

Ok, thank you. It seems different deep learning frameworks might have different behaviors. PaddlePaddle would count the contribution of equal values as 1, but without averaging the gradients.

BarclayII · May 30, 2022, 6:35am

Since max is not a continuous function, I think as long as the gradient is one of the subgradients either way should work fine mathematically - although I’m not sure whether the performance impact would be.

system · June 29, 2022, 6:36am

This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.