Oleg Zabluda's blog
Thursday, September 13, 2018
 
"""
"""
loss scaling. “With single precision, anything smaller than a certain value ends up becoming a zero. In many networks we train, a large number of activation gradients fall below that threshold. We don’t want to lose those values that become zeroes, so we run a forward pass as usual, and the resulting loss is multiplied with a scaling factor that is made magnitudes larger. That is moved into a range that is representable in half precision, but before the updates are made, we scale those back down to make sure we’re staying accurate. It ultimately gets added to those single precision master weights and the process is repeated.”
"""
https://www.nextplatform.com/2017/10/11/baidu-sheds-precision-without-paying-deep-learning-accuracy-cost/
https://www.nextplatform.com/2017/10/11/baidu-sheds-precision-without-paying-deep-learning-accuracy-cost/

Labels:


| |

Home

Powered by Blogger