about in file as follow :
Sparse Emb
Why not clean the grad in the sparse emb ? could you expalain the benifit . thx very much .
about in file as follow :
Sparse Emb
Why not clean the grad in the sparse emb ? could you expalain the benifit . thx very much .
sorry I just see it.
but why write this 'zero_grad control ’ alone ?
in some case , grad need to be cumulatived ?
We follow the style of Pytorch optimizer. If you donot call zero_grad(), the grads will be accumulated.
thx very much … 20 char
This topic was automatically closed 30 days after the last reply. New replies are no longer allowed.