Oleg Zabluda's blog
Tuesday, October 04, 2016
 
Sepp Hochreiter's Fundamental Deep Learning Problem (1991) | Jürgen Schmidhuber, 2013
Sepp Hochreiter's Fundamental Deep Learning Problem (1991) | Jürgen Schmidhuber, 2013
"""
A first milestone of Deep Learning research was the 1991 diploma thesis of Sepp Hochreiter [1], my very first student, who is now a professor in Linz. His work formally showed that deep neural networks are hard to train, because they suffer from the now famous problem of vanishing or exploding gradients: in typical deep or recurrent networks, [...] they decay exponentially in the number of layers, or they explode. All our subsequent Deep Learning research of the 1990s and 2000s was motivated by this insight.

The thesis is in German [1]. [...] Ten years later, an additional survey came out in English [2].

We have found four ways of partially overcoming the Fundamental Deep Learning Problem:

I. My first Deep Learner of 1991 overcame it through unsupervised pre-training for a hierarchy of (recurrent) neural networks [4]. This greatly facilitated subsequent supervised credit assignment through back-propagation.

II. LSTM-like networks (since 1997) [5] avoid the problem through special architecture unaffected by it.

III. Today, a million times faster GPU-based computers allow for propagating errors a few layers further down within reasonable time, even in traditional NN - that's basically what's winning many of the image competitions now, e.g., [6]. (Although this does not really overcome the problem in a fundamental way.)
"""
http://people.idsia.ch/~juergen/fundamentaldeeplearningproblem.html
http://people.idsia.ch/~juergen/fundamentaldeeplearningproblem.html

Labels:


| |

Home

Powered by Blogger