Thursday, September 15, 2016
Yann LeCun's:What's so great about "Extreme Learning Machines"?
Yann LeCun's:What's so great about "Extreme Learning Machines"?
"""
There is an interesting sociological phenomenon taking place in some corners of machine learning right now. A small research community, largely centered in China, has rallied around the concept of "Extreme Learning Machines".
Frankly, I don't understand what's so great about ELM. Would someone please care to explain?
An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.
First, the name: an ELM is exactly what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.
Second, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart). Let's just list a few: using families of basis functions such as polynomials, using "kernel methods" in which the basis functions (aka neurons) are centered on the training samples, using clustering or GMM to place the centers of the basis functions where the data is (something we used to call RBF networks), and using gradient descent to optimize the position of the basis functions (aka a 2-layer neural net trained with backprop).
Setting the layer-one weights randomly (if you do it in an appropriate way) can possibly be effective if the function you are trying to learn is very simple, and the amount of labelled data is small. The advantages are similar to that of an SVM (though to a lesser extent): the number of parameters that need to be trained supervised is small (since the first layer is fixed) and easily regularized (since they constitute a linear classifier). But then, why not use an SVM or an RBF net in the first place?
There may be a very narrow area of simple classification problems with small datasets where this kind of 2-layer net with random first layer may perform OK. But you will never see them beat records on complex tasks, such as ImageNet or speech recognition.
http://www.extreme-learning-machines.org/ """
kjearns: However, I think that even if LeCun is overly negative about random features, he is correct about them being less powerful than learned features. You can see this in the Deep Fried Convnets paper (http://arxiv.org/abs/1412.7149) which looks at both ordinary (random features) and adaptive (learned features) versions of Fast Food. The differences are invisible on MNIST, but substantial on ImageNet. I think this points towards the limits of what is possible with random features, and indicates that even though sometimes random features work surprisingly well, learned features are genuinely more powerful.
Another reason to think random features won't scale are that this idea never caught on: http://www.robotics.stanford.edu/~ang/papers/nipsdlufl10-RandomWeights.pdf. That paper shows that using random weights with a linear classifier on top works well to compare shallow network architectures. This is great when it works but If you try to do something similar with a modern imagenet network it fails horribly (you can see it fail horribly in figure 3 of this paper [How transferable are features in deep neural networks?] http://arxiv.org/abs/1411.1792).
"""
I think he's acknowledging their usefulness in SVM contexts but asks why in that case you wouldn't just use SVM
"""
kjearns:The reason you wouldn't just use an SVM is to avoid building the full kernel matrix. When you have lots of data this is really expensive, but using random features means you can easily train kernelized SVMs with SGD.
"""
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning
Labels: Oleg Zabluda