Oleg Zabluda's blog
Thursday, April 27, 2017
 
"""
"""
Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. [...] In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. [...] For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:

Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort.
[...]
The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don’t flow back far enough to cause the required parameter updates that would store the required information.
"""
https://deeplearning4j.org/usingrnns
https://deeplearning4j.org/usingrnns

Labels:


| |

Home

Powered by Blogger