Friday, September 23, 2016
Learning to Segment
Learning to Segment
"""
DeepMask segmentation framework coupled with our new SharpMask segment refinement module. Together, they have enabled FAIR's machine vision systems to detect and precisely delineate every object in an image. The final stage of our recognition pipeline uses a specialized convolutional net, which we call MultiPathNet, to label each object mask with the object type it contains (e.g. person, dog, sheep).
[...]
The technique we use in DeepMask is to think of segmentation as a very large number of binary classification problems. First, for every (overlapping) patch in an image we ask: Does this patch contain an object? Second, if the answer to the first question is yes for a given patch, then for every pixel in the patch we ask: Is that pixel part of the central object in the patch?
[...]
DeepMask employs a fairly traditional feedforward deep network [...] upper layers tend to capture more semantic concepts such as the presence of an animal's face or limbs. By design, these upper-layer features are computed at a fairly low spatial resolution (for both computational reasons and in order to be invariant to small shifts in pixel locations). This presents a problem for mask prediction: The upper layer features can be used to predict masks that capture the general shape on an object but fail to precisely capture object boundaries.
Which brings us to SharpMask. SharpMask refines the output of DeepMask, generating higher-fidelity masks that more accurately delineate object boundaries. While DeepMask predicts coarse masks in a feedforward pass through the network, SharpMask reverses the flow of information in a deep network and refines the predictions made by DeepMask by using features from progressively earlier layers in the network. Think of it this way: To capture general object shape, you have to have a high-level understanding of what you are looking at (DeepMask), but to accurately place the boundaries you need to look back at lower-level features all the way down to the pixels (SharpMask).
[...]
DeepMask knows nothing about specific object types, so while it can delineate both a dog and a sheep, it can't tell them apart. Plus, DeepMask is not very selective and can generate masks for image regions that are not especially interesting. So how do we narrow down the pool of relevant masks and identify the objects that are actually present?
As you might expect, we turn to deep neural networks once again. Given a mask generated by DeepMask, we train a separate deep net to classify the object type of each mask (and “none” is a valid answer as well). Here we are following the foundational paradigm called Region-CNN, or RCNN for short, pioneered by Ross Girshick (now also a member of FAIR). RCNN is a two-stage procedure where a first stage is used to draw attention to certain image regions, and in a second stage a deep net is used to identify the objects present. When developing RCNN, the first stage of processing available was fairly primitive. By using DeepMask as the first stage for RCNN and exploiting the power of deep networks we get a significant boost in detection accuracy and also gain the ability to segment objects.
To further boost performance, we also focused on using a specialized network architecture to classify each mask (the second stage of RCNN). As we discussed, real-world photographs contains objects at multiple scales, in context and among clutter, and under frequent occlusion. Standard deep nets can have difficulty in such situations. To address this, we proposed a modified network that we call MultiPathNet. As its name implies, MultiPathNet allows information to flow along multiple paths through the net, enabling it to exploit information at multiple image scales and in surrounding image context.
In summary, our object detection system follows a three stage procedure: (1) DeepMask generates initial object masks, (2) SharpMask refines these masks, and finally (3) MultiPathNet identifies the objects delineated by each mask.
"""
https://research.facebook.com/blog/learning-to-segment/
https://research.facebook.com/blog/learning-to-segment
Labels: Oleg Zabluda