Oleg Zabluda's blog
Thursday, April 27, 2017
“Three new graphical models for statistical language modelling” (2007) A. Mnih, G. E. Hinton
“Three new graphical models for statistical language modelling” (2007) A. Mnih, G. E. Hinton
"""
4. A Log-Bilinear Language Model
the model predicts a feature vector for the next word by computing a linear function of the context word feature vectors. Then it assigns probabilities to all words in the vocabulary based on the similarity of their feature vectors to the predicted feature vector as measured by the dot product.
"""
https://www.cs.toronto.edu/~amnih/papers/threenew.pdf
https://www.cs.toronto.edu/~amnih/papers/threenew.pdf
Labels: Oleg Zabluda
Sheryl Sandberg admits political mistakes (заблуждения) of Marksism-LeanInism - telling women to get good marks in...
Sheryl Sandberg admits political mistakes (заблуждения) of Marksism-LeanInism - telling women to get good marks in school and stuff
https://www.yahoo.com/digest/20170424/facebooks-coo-pens-book-resilience-loss-admits-lean-tone-deaf-00217405
"""
Sandberg also uses the new book to address what she now sees as shortcomings in the career advice she offered women in "Lean In." Surveying the world as a wealthy corporate executive rendered her oblivious to the circumstances faced by less fortunate women, she acknowledges. Not everyone can lean in; not everyone wants to.
"""
https://www.yahoo.com/digest/20170424/facebooks-coo-pens-book-resilience-loss-admits-lean-tone-deaf-00217405
Labels: Oleg Zabluda
Training recurrent neural networks (2013) Ilya Sutskever: Ph.D. thesis
Training recurrent neural networks (2013) Ilya Sutskever: Ph.D. thesis
"""
2.5.2 Recurrent Neural Networks as Generative models
Generative models are parameterized families of probability distributions that extrapolate a finite training set to a distribution over the entire space. [...] An RNN defines a generative model over sequences if the loss function satisfies ...
[...]
5.3 The Objective Function
The goal of character-level language modeling is to predict the next c*5.3 The Objective Function*
The language modeling objective is to maximize the total log probability of the training sequence [...] which implies that the RNN learns a probability distribution over sequences
5.4.4 Debagging
It is easy to convert a sentence into a bag of words, but it is much harder to convert a bag of words into a meaningful sentence. We name the latter the debagging problem. We perform an experiment where a character-level language model evaluates every possible ordering of the words in the bag, and returns and the ordering it deems best. To make the experiment tractable, we only considered bags of 7 words, giving a search space of size 5040. For our experiment, we used the MRNN [...] to debag 500 bags of randomly chosen words from “Ana Karenina”. We use 11 words for each bag, where the first two and the last two words are used as context to aid debagging the middle seven words. We say that the model correctly debags a sentence if the correct ordering is assigned the highest log probability. We found that the wikipedia trained MRNN recovered the correct ordering 34% of the time,
"""
http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf
http://www.cs.utoronto.ca/~ilya/pubs/ilya_sutskever_phd_thesis.pdf
Labels: Oleg Zabluda
"""
"""
Truncated backpropagation through time (BPTT) was developed in order to reduce the computational complexity of each parameter update in a recurrent neural network. [...] In practice, truncated BPTT splits the forward and backward passes into a set of smaller forward/backward pass operations. [...] For example, if we use truncated BPTT of length 4 time steps, learning looks like the following:
Note that the overall complexity for truncated BPTT and standard BPTT are approximately the same - both do the same number of time step during forward/backward pass. Using this method however, we get 3 parameter updates instead of one for approximately the same amount of effort.
[...]
The downside of truncated BPTT is that the length of the dependencies learned in truncated BPTT can be shorter than in full BPTT. This is easy to see: consider the images above, with a TBPTT length of 4. Suppose that at time step 10, the network needs to store some information from time step 0 in order to make an accurate prediction. In standard BPTT, this is ok: the gradients can flow backwards all the way along the unrolled network, from time 10 to time 0. In truncated BPTT, this is problematic: the gradients from time step 10 simply don’t flow back far enough to cause the required parameter updates that would store the required information.
"""
https://deeplearning4j.org/usingrnns
https://deeplearning4j.org/usingrnns
Labels: Oleg Zabluda
Virtual Worlds as Proxy for Multi-Object Tracking Analysis (2016) Gaidon et al
Virtual Worlds as Proxy for Multi-Object Tracking Analysis (2016) Gaidon et al
https://arxiv.org/abs/1605.06457
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gaidon_Virtual_Worlds_as_CVPR_2016_paper.pdf
Virtual KITTI dataset
"""
Virtual KITTI is a photo-realistic synthetic video dataset [...] contains 50 high-resolution monocular videos (21,260 frames) generated from five different virtual worlds in urban settings under different imaging and weather conditions. These worlds were created using the Unity game engine and a novel real-to-virtual cloning method. These photo-realistic synthetic videos are automatically, exactly, and fully annotated for 2D and 3D multi-object tracking and at the pixel level with category, instance, flow, and depth labels
"""
http://www.xrce.xerox.com/Our-Research/Computer-Vision/Proxy-Virtual-Worlds
http://www.cv-foundation.org/openaccess/content_cvpr_2016/papers/Gaidon_Virtual_Worlds_as_CVPR_2016_paper.pdf
Labels: Oleg Zabluda