Oleg Zabluda's blog
Friday, October 07, 2016
 
Hypercolumns for Object Segmentation and Fine-grained Localization (2014) Bharath Hariharan, Pablo Arbeláez, Ross...

Hypercolumns for Object Segmentation and Fine-grained Localization (2014) Bharath Hariharan, Pablo Arbeláez, Ross Girshick, Jitendra Malik
"""
Recognition algorithms based on convolutional networks (CNNs) typically use the output of the last layer as feature representation. However, the information in this layer may be too coarse to allow precise localization. On the contrary, earlier layers may be precise in localization but will not capture semantics. To get the best of both worlds, we define the hypercolumn at a pixel as the vector of activations of all CNN units above that pixel. Using hypercolumns as pixel descriptors, we show results on three fine-grained localization tasks: simultaneous detection and segmentation [...], keypoint localization, [...] [Pose estimation] and part labeling,
[...]
Our hypothesis is that the information of interest is distributed over all levels of the CNN and should be exploited in this way. We define the “hypercolumn” at a given input location as the outputs of all units above that location at all layers of the CNN, stacked into one vector. (Because adjacent layers are strongly correlated, in practice we need not consider all layers but can simply sample a few.) Figure 1 shows a
visualization of the idea. We borrow the term “hypercolumn” from neuroscience, where it is used to describe a set of V1 neurons sensitive to edges at multiple orientations and multiple frequencies arranged in a columnar structure [24]. However, our hypercolumn includes not just edge detectors but also more semantic units and is thus a more general notion. [OZ: Maybe it's the same in the brain]
[...]
However, because of subsampling and pooling operations in the CNN, these feature maps need not be at the same resolution as the input or the target output size. So which unit lies above a particular location is ambiguous. We get around this by simply resizing each feature map to the size we want with bilinear interpolation.
"""
https://arxiv.org/abs/1411.5752

Labels:



Powered by Blogger