Oleg Zabluda's blog: September 23, 2016

Oleg Zabluda's blog

Friday, September 23, 2016

Powerlifting Motivation - Legendary Moments in Deadlift History
Powerlifting Motivation - Legendary Moments in Deadlift History
https://www.youtube.com/watch?v=SL-8fzlmQU0
https://www.youtube.com/watch?v=SL-8fzlmQU0

Labels: Oleg Zabluda

Aspiring immunologist Maria Birukova, a fourth-year gradauate student in the MD/PhD program at the Stanford School...
Aspiring immunologist Maria Birukova, a fourth-year gradauate student in the MD/PhD program at the Stanford School of Medicine, died in climbing accident on Bear Creek Spire. As the pair was traversing the [OZ: 4th class] route the victim, a 26 year old woman from the San Francisco Bay Area, lost her footing and fell between 800-1000 feet. She died on Sept. 18.

https://www.facebook.com/InyoCountySheriffsOffice/posts/647481465434424
https://www.facebook.com/InyoCountySheriffsOffice/posts/647481465434424

Labels: Oleg Zabluda

KITTI dataset
KITTI dataset
https://www.youtube.com/watch?v=KXpZ6B1YB_k

KITTI dataset with ground truth 3D bounding box annotations and roads/building outlines from OSM overlaid
https://www.youtube.com/watch?v=_imrrzn8NDk

The KITTI Dataset home
http://www.cvlibs.net/datasets/kitti/

Vision meets Robotics: The KITTI Dataset (2013) Andreas Geiger, et al
http://www.cvlibs.net/publications/Geiger2013IJRR.pdf

KITTI Vision Benchmark: Implements loading dataset:
http://docs.opencv.org/3.0-beta/modules/datasets/doc/datasets/slam_kitti.html
https://www.youtube.com/watch?v=KXpZ6B1YB_k

Labels: Oleg Zabluda

A Compilation of Robots Falling Down at the DARPA Robotics Challenge (IEEE Spectrum)
A Compilation of Robots Falling Down at the DARPA Robotics Challenge (IEEE Spectrum)
https://www.youtube.com/watch?v=g0TaYhjpOfo (0:48 is my favorite)

Compare to (Expectation vs Reality):

The Terminator (1984) HD Intro
https://www.youtube.com/watch?v=CTvvCft3gpo

Terminator 2 - Opening Scene (HD)
https://www.youtube.com/watch?v=_Mg7qKstnPk

Terminator 3 - Opening Titles
https://www.youtube.com/watch?v=ACfIv8UUJag
https://www.youtube.com/watch?v=g0TaYhjpOfo&t=0m50s

Labels: Oleg Zabluda

Learning to Segment
Learning to Segment
"""
DeepMask segmentation framework coupled with our new SharpMask segment refinement module. Together, they have enabled FAIR's machine vision systems to detect and precisely delineate every object in an image. The final stage of our recognition pipeline uses a specialized convolutional net, which we call MultiPathNet, to label each object mask with the object type it contains (e.g. person, dog, sheep).
[...]
The technique we use in DeepMask is to think of segmentation as a very large number of binary classification problems. First, for every (overlapping) patch in an image we ask: Does this patch contain an object? Second, if the answer to the first question is yes for a given patch, then for every pixel in the patch we ask: Is that pixel part of the central object in the patch?
[...]
DeepMask employs a fairly traditional feedforward deep network [...] upper layers tend to capture more semantic concepts such as the presence of an animal's face or limbs. By design, these upper-layer features are computed at a fairly low spatial resolution (for both computational reasons and in order to be invariant to small shifts in pixel locations). This presents a problem for mask prediction: The upper layer features can be used to predict masks that capture the general shape on an object but fail to precisely capture object boundaries.

Which brings us to SharpMask. SharpMask refines the output of DeepMask, generating higher-fidelity masks that more accurately delineate object boundaries. While DeepMask predicts coarse masks in a feedforward pass through the network, SharpMask reverses the flow of information in a deep network and refines the predictions made by DeepMask by using features from progressively earlier layers in the network. Think of it this way: To capture general object shape, you have to have a high-level understanding of what you are looking at (DeepMask), but to accurately place the boundaries you need to look back at lower-level features all the way down to the pixels (SharpMask).
[...]
DeepMask knows nothing about specific object types, so while it can delineate both a dog and a sheep, it can't tell them apart. Plus, DeepMask is not very selective and can generate masks for image regions that are not especially interesting. So how do we narrow down the pool of relevant masks and identify the objects that are actually present?

As you might expect, we turn to deep neural networks once again. Given a mask generated by DeepMask, we train a separate deep net to classify the object type of each mask (and “none” is a valid answer as well). Here we are following the foundational paradigm called Region-CNN, or RCNN for short, pioneered by Ross Girshick (now also a member of FAIR). RCNN is a two-stage procedure where a first stage is used to draw attention to certain image regions, and in a second stage a deep net is used to identify the objects present. When developing RCNN, the first stage of processing available was fairly primitive. By using DeepMask as the first stage for RCNN and exploiting the power of deep networks we get a significant boost in detection accuracy and also gain the ability to segment objects.

To further boost performance, we also focused on using a specialized network architecture to classify each mask (the second stage of RCNN). As we discussed, real-world photographs contains objects at multiple scales, in context and among clutter, and under frequent occlusion. Standard deep nets can have difficulty in such situations. To address this, we proposed a modified network that we call MultiPathNet. As its name implies, MultiPathNet allows information to flow along multiple paths through the net, enabling it to exploit information at multiple image scales and in surrounding image context.

In summary, our object detection system follows a three stage procedure: (1) DeepMask generates initial object masks, (2) SharpMask refines these masks, and finally (3) MultiPathNet identifies the objects delineated by each mask.
"""
https://research.facebook.com/blog/learning-to-segment/
https://research.facebook.com/blog/learning-to-segment

Labels: Oleg Zabluda

Spatial Transformer Networks (2015) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
Spatial Transformer Networks (2015) Max Jaderberg, Karen Simonyan, Andrew Zisserman, Koray Kavukcuoglu
"""
Convolutional Neural Networks [...] are still limited by the lack of ability to be spatially invariant to the input data in a computationally and parameter efficient manner. In this work we introduce a new learnable module, the Spatial Transformer, which explicitly allows the spatial manipulation of data within the network. This differentiable module can be inserted into existing convolutional architectures, giving neural networks the ability to actively spatially transform feature maps, conditional on the feature map itself, without any extra training supervision or modification to the optimisation process. We show that the use of spatial transformers results in models which learn invariance to translation, scale, rotation and more generic warping, resulting in state-of-the-art performance on several benchmarks, and for a number of classes of transformations.
"""
https://arxiv.org/abs/1506.02025
https://arxiv.org/abs/1506.02025

Labels: Oleg Zabluda

ImageNet Large Scale Visual Recognition Challenge (2014) Olga Russakovsky, Jia Deng, Hao Su, [...] Andrej Karpathy,...
ImageNet Large Scale Visual Recognition Challenge (2014) Olga Russakovsky, Jia Deng, Hao Su, [...] Andrej Karpathy, Aditya Khosla, [...] Li Fei-Fei
"""
This paper describes the creation of this benchmark dataset [...] discuss the challenges of collecting large-scale ground truth annotation,
"""
http://arxiv.org/abs/1409.0575
http://arxiv.org/abs/1409.0575

Labels: Oleg Zabluda

About Me