Oleg Zabluda's blog
Thursday, September 15, 2016
 
Штурм термоядерной крепости. М., Наука, 1985
Штурм термоядерной крепости. М., Наука, 1985
(выпуск 37 серии "библиотечка квант")
http://ilib.mccme.ru/djvu/bib-kvant/shturm.htm

I was reading it in high school and wanted to do "Термояд". Boy, what a stupid idea that was.
http://ilib.mccme.ru/djvu/bib-kvant/shturm.htm

Labels:


 
Who invented backpropagation?

Who invented backpropagation?

Originally shared by Juergen Schmidhuber

Who invented backpropagation?

Efficient backpropagation (BP) is central to the ongoing Neural Network (NN) ReNNaissance and "Deep Learning."  Who invented it?

It is easy to find misleading accounts of BP's history. I had a look at the original papers from the 1960s and 70s, and talked to BP pioneers. Here is a summary derived from my recent survey, which has additional references:

The minimisation of errors through gradient descent (Hadamard, 1908) in the parameter space of complex, nonlinear, differentiable, multi-stage, NN-related systems has been discussed at least since the early 1960s (e.g., Kelley, 1960; Bryson, 1961; Bryson and Denham, 1961; Pontryagin et al., 1961; Dreyfus, 1962; Wilkinson, 1965; Amari, 1967; Bryson and Ho, 1969; Director and Rohrer, 1969), initially within the framework of Euler-LaGrange equations in the Calculus of Variations (e.g., Euler, 1744).

Steepest descent in the weight space of such systems can be performed (Bryson, 1961; Kelley, 1960; Bryson and Ho, 1969) by iterating the chain rule (Leibniz, 1676; L'Hopital, 1696) à la Dynamic Programming (DP, Bellman, 1957). A simplified derivation of this backpropagation method uses the chain rule only (Dreyfus, 1962).

The systems of the 1960s were already efficient in the DP sense. However, they backpropagated derivative information through standard Jacobian matrix calculations from one "layer" to the previous one, without explicitly addressing either direct links across several layers or potential additional efficiency gains due to network sparsity (but perhaps such enhancements seemed obvious to the authors).

Explicit, efficient error backpropagation (BP) in arbitrary, discrete, possibly sparsely connected, NN-like networks apparently was first described in a 1970 master's thesis (Linnainmaa, 1970, 1976), albeit without reference to NNs. BP is also known as the reverse mode of automatic differentiation (e.g., Griewank, 2012), where the costs of forward activation spreading essentially equal the costs of backward derivative calculation. See early BP FORTRAN code (Linnainmaa, 1970) and closely related work (Ostrovskii et al., 1971).

BP was soon explicitly used to minimize cost functions by adapting control parameters (weights) (Dreyfus, 1973). This was followed by some preliminary, NN-specific discussion (Werbos, 1974, section 5.5.1), and a computer program for automatically deriving and implementing BP for any given differentiable system (Speelpenning, 1980).

To my knowledge, the first NN-specific application of efficient BP as above was described in 1981 (Werbos, 1981). Related work was published several years later (Parker, 1985; LeCun, 1985). A paper of 1986 significantly contributed to the popularisation of BP for NNs (Rumelhart et al., 1986).

Compare also the first adaptive, deep, multilayer perceptrons (Ivakhnenko et al., since 1965), whose layers are incrementally grown and trained by regression analysis. A paper of 1971 already described a deep GMDH network with 8 layers (Ivakhnenko, 1971).
Compare also a more recent method for multilayer threshold NNs (Bobrowski, 1978).

Precise references and more history in:

Deep Learning in Neural Networks: An Overview
PDF & LATEX source & complete public BIBTEX file under
http://www.idsia.ch/~juergen/deep-learning-overview.html

Juergen Schmidhuber
http://www.idsia.ch/~juergen/whatsnew.html

P.S.: I'll give talks on Deep Learning and other things in the NYC area around 1-5 and 18-19 August, and in the Bay area around 7-15 August; videos of previous talks can be found under http://www.idsia.ch/~juergen/videos.html

http://www.idsia.ch/~juergen/who-invented-backpropagation.html

#machinelearning
#artificialintelligence
#computervision
#deeplearning

Labels: , , , ,


 
"""
"""
Gaidon and colleagues used a popular game development engine, called Unity, to generate virtual scenes for training deep-learning algorithms [...] to recognize objects and situations in real images.
"""
https://www.technologyreview.com/s/601009/to-get-truly-smart-ai-might-need-to-play-more-video-games/

"""
off-the-shelf computer games, [...] photorealistic imagery [...] A team of researchers from Intel Labs and Darmstadt University in Germany has developed a clever way to extract useful training data from Grand Theft Auto. [...] created a software layer that sits between the game and a computer’s hardware, automatically classifying different objects in the road scenes shown in the game. [...] it would be nearly impossible to have people label all of the scenes with similar detail manually. The researchers also say that real training images can be improved with the addition of some synthetic imagery. [...] It takes thousands of hours to collect real street imagery, and thousands more to label all of those images. It’s also impractical to go through every possible scenario in real life, like crashing a car into a brick wall at a high speed.
"""
https://www.technologyreview.com/s/602317/self-driving-cars-can-learn-a-lot-by-playing-grand-theft-auto/
https://www.technologyreview.com/s/602317/self-driving-cars-can-learn-a-lot-by-playing-grand-theft-auto

Labels:


 
Yann LeCun's:What's so great about "Extreme Learning Machines"?
Yann LeCun's:What's so great about "Extreme Learning Machines"?
"""
There is an interesting sociological phenomenon taking place in some corners of machine learning right now. A small research community, largely centered in China, has rallied around the concept of "Extreme Learning Machines".

Frankly, I don't understand what's so great about ELM. Would someone please care to explain?

An ELM is basically a 2-layer neural net in which the first layer is fixed and random, and the second layer is trained. There is a number of issues with this idea.

First, the name: an ELM is exactly what Minsky & Papert call a Gamba Perceptron (a Perceptron whose first layer is a bunch of linear threshold units). The original 1958 Rosenblatt perceptron was an ELM in that the first layer was randomly connected.

Second, the method: connecting the first layer randomly is just about the stupidest thing you could do. People have spent the almost 60 years since the Perceptron to come up with better schemes to non-linearly expand the dimension of an input vector so as to make the data more separable (many of which are documented in the 1974 edition of Duda & Hart). Let's just list a few: using families of basis functions such as polynomials, using "kernel methods" in which the basis functions (aka neurons) are centered on the training samples, using clustering or GMM to place the centers of the basis functions where the data is (something we used to call RBF networks), and using gradient descent to optimize the position of the basis functions (aka a 2-layer neural net trained with backprop).

Setting the layer-one weights randomly (if you do it in an appropriate way) can possibly be effective if the function you are trying to learn is very simple, and the amount of labelled data is small. The advantages are similar to that of an SVM (though to a lesser extent): the number of parameters that need to be trained supervised is small (since the first layer is fixed) and easily regularized (since they constitute a linear classifier). But then, why not use an SVM or an RBF net in the first place?

There may be a very narrow area of simple classification problems with small datasets where this kind of 2-layer net with random first layer may perform OK. But you will never see them beat records on complex tasks, such as ImageNet or speech recognition.
http://www.extreme-learning-machines.org/ """
kjearns: However, I think that even if LeCun is overly negative about random features, he is correct about them being less powerful than learned features. You can see this in the Deep Fried Convnets paper (http://arxiv.org/abs/1412.7149) which looks at both ordinary (random features) and adaptive (learned features) versions of Fast Food. The differences are invisible on MNIST, but substantial on ImageNet. I think this points towards the limits of what is possible with random features, and indicates that even though sometimes random features work surprisingly well, learned features are genuinely more powerful.

Another reason to think random features won't scale are that this idea never caught on: http://www.robotics.stanford.edu/~ang/papers/nipsdlufl10-RandomWeights.pdf. That paper shows that using random weights with a linear classifier on top works well to compare shallow network architectures. This is great when it works but If you try to do something similar with a modern imagenet network it fails horribly (you can see it fail horribly in figure 3 of this paper [How transferable are features in deep neural networks?] http://arxiv.org/abs/1411.1792).
"""
I think he's acknowledging their usefulness in SVM contexts but asks why in that case you wouldn't just use SVM
"""
kjearns:The reason you wouldn't just use an SVM is to avoid building the full kernel matrix. When you have lots of data this is really expensive, but using random features means you can easily train kernelized SVMs with SGD.
"""
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning/
https://www.reddit.com/r/MachineLearning/comments/34u0go/yann_lecun_whats_so_great_about_extreme_learning

Labels:


 
Some airliners are 127 mpg per person when full, etc
Some airliners are 127 mpg per person when full, etc
https://en.wikipedia.org/wiki/Fuel_economy_in_aircraft
https://en.wikipedia.org/wiki/Fuel_economy_in_aircraft

Labels:


 
"""
"""
SymPy is a Python library for symbolic mathematics. It aims to become a full-featured computer algebra system (CAS) while keeping the code as simple as possible in order to be comprehensible and easily extensible. SymPy is written entirely in Python.
"""
http://www.sympy.org/en/index.html

SymPy Live running on the Google App Engine.
http://live.sympy.org/

Tutorial
http://docs.sympy.org/latest/tutorial/index.html

Documentation
http://docs.sympy.org/latest/index.html
http://www.sympy.org/en/index.html

Labels:



Powered by Blogger