Oleg Zabluda's blog
Friday, January 26, 2018
Dropout in Recurrent Networks
Dropout in Recurrent Networks
Before Gal and Ghahramani [6], new dropout masks are created for each time step. Empirical results have led many to believe that noise added to recurrent layers (connections between RNN units) will be amplified for long sequences, and drown the signal [7]. Consequently, it was concluded that dropout should be used with only the inputs and outputs of the RNN. (See the left part of figure below)
For word embeddings dropout, Keras seems to once have dropout parameter in its embedding layer, but it has been removed for some reason. [...] Keras implementation mainly resides in LSTM class. [OZ: detailed explanations follow...]

[6] Gal, Y., & Ghahramani, Z. (2015). A Theoretically Grounded Application of Dropout in Recurrent Neural Networks.
[7] Zaremba, W., Sutskever, I., & Vinyals, O. (2014). Recurrent Neural Network Regularization.
https://becominghuman.ai/learning-note-dropout-in-recurrent-networks-part-1-57a9c19a2307 (part 1)
https://towardsdatascience.com/learning-note-dropout-in-recurrent-networks-part-2-f209222481f8 (part 2)
https://towardsdatascience.com/learning-note-dropout-in-recurrent-networks-part-3-1b161d030cd4 (part 3)



Powered by Blogger