Oleg Zabluda's blog
Thursday, September 22, 2016
 
ping pong carnival (卓球芸人ぴんぽんまとめ English Ver.)
ping pong carnival (卓球芸人ぴんぽんまとめ English Ver.)
https://www.youtube.com/watch?v=HYz73W_dufc&feature=share

Labels:


 
Show and Tell: image captioning open sourced in TensorFlow
Show and Tell: image captioning open sourced in TensorFlow
"""
This release contains significant improvements to the computer vision component of the captioning system, is much faster to train, and produces more detailed and accurate descriptions compared to the original system. These improvements are outlined and analyzed in the paper Show and Tell: Lessons learned from the 2015 MSCOCO Image Captioning Challenge, published in IEEE Transactions on Pattern Analysis and Machine Intelligence.

Our 2014 system used the Inception V1 [...] achieving 89.6% top-5 accuracy on [...] ImageNet 2012 [...] We replaced this in 2015 with the newer Inception V2 image classification model, which achieves 91.8% accuracy on the same task. The improved vision component gave our captioning system an accuracy boost of 2 points in the BLEU-4 metric (which is commonly used in machine translation to evaluate the quality of generated sentences) and was an important factor of its success in the captioning challenge.

Today’s code release initializes the image encoder using the Inception V3 model, which achieves 93.9% accuracy on the ImageNet classification task. [...] This gives an additional 2 points of improvement in the BLEU-4 metric over the system used in the captioning challenge.

Another key improvement to the vision component comes from fine-tuning the image model. This step addresses the problem that the image encoder is initialized by a model trained to classify objects in images, whereas the goal of the captioning system is to describe the objects in images using the encodings produced by the image model. For example, an image classification model will tell you that a dog, grass and a frisbee are in the image, but a natural description should also tell you the color of the grass and how the dog relates to the frisbee.

In the fine-tuning phase, the captioning system is improved by jointly training its vision and language components on human generated captions. This allows the captioning system to transfer information from the image that is specifically useful for generating descriptive captions, but which was not necessary for classifying objects. In particular, after fine-tuning it becomes better at correctly describing the colors of objects. Importantly, the fine-tuning phase must occur after the language component has already learned to generate captions - otherwise, the noisiness of the randomly initialized language component causes irreversible corruption to the vision component. For more details, read the full paper here.
"""
https://research.googleblog.com/2016/09/show-and-tell-image-captioning-open.html

Labels:


 
Understanding Locally Competitive Networks (2014) Rupesh Kumar Srivastava, Jonathan Masci, Faustino Gomez, Jürgen...
Understanding Locally Competitive Networks (2014) Rupesh Kumar Srivastava, Jonathan Masci, Faustino Gomez, Jürgen Schmidhuber
"""
ReLU (Glorot et al., 2011)), maxout (Goodfellow et al., 2013a) and LWTA (Srivastava et al., 2013) are quite unlike sigmoidal activation [...] A common theme [...] is that they are locally competitive. Maxout and LWTA utilize explicit competition between units in small groups within a layer, while in the case of the rectified linear function, the weighted input sum competes with a fixed value of 0. [...] We start from the observation that in locally competitive networks, a subnetwork of units has nonzero activations for each input pattern [OZ: huh]. Instead of treating a neural network as a complex function approximator, the expressive power of the network can be interpreted to be coming from its ability to activate different subsets of linear units for different patterns. We hypothesize that the network acts as a model that can switch between “submodels” (subnetworks) such that similar submodels respond to similar patterns. As evidence of this behavior, we analyze the activated subnetworks for a large subset of a dataset (which is not used for training) and show that the subnetworks activated for different examples exhibit a structure consistent with our hypothesis. These observations provide a unified explanation for improved credit assignment in locally competitive networks during training, which is believed to be the main reason for their success. Our new point of view suggests a link between these networks and competitive learning approaches of the past decades.
"""
https://arxiv.org/abs/1410.1165
https://arxiv.org/abs/1410.1165

Labels:



Powered by Blogger