Oleg Zabluda's blog
Friday, September 30, 2016
 
Variable Rate Image Compression with Recurrent Neural Networks (2015) George Toderici et al
Variable Rate Image Compression with Recurrent Neural Networks (2015) George Toderici et al
"""
standard autoencoders operate under a number of hard constraints that have so far made them infeasible as a drop-in replacement for standard image codecs. Some of these constraints are that variable rate encoding is typically not possible (one network is trained per compression rate); the visual quality of the output is hard to ensure; and they’re typically trained for a particular scale, being able to capture redundancy only at that scale.

We explore several different ways in which neural network-driven image compression can improve compression rates while allowing similar flexibility to modern codecs. To achieve this flexibility, the network architectures we discuss must meet all of the following requirements: (1) the compression rate should be capable of being restricted to a prior bit budget; (2) the compressor should be able to encode simpler patches more cheaply
[...]
A typical compressing autoencoder has three parts: (1) an encoder which consumes an input (e.g., a fixed-dimension image or patch) and transforms it into (2) a bottleneck representing the compressed data, which can then be transformed by (3) a decoder into something resembling the original input. These three elements are trained end-to-end, but during deployment the encoder and decoder are normally used independently. The bottleneck is often simply a flat neural net layer, which allows the compression rate and visual fidelity of the encoded images to be controlled by adjusting the number of nodes in this layer before training. For some types of autoencoder, encoding the bottleneck as a simple bit vector can be beneficial (Krizhevsky & Hinton, 2011). In neural net-based classification tasks, images are repeatedly downsampled through convolution and pooling operations, and the entire output of the network might be contained in just a single node. In the decoder half of an autoencoder, however, the network must proceed in the opposite direction and convert a short bit vector into a much larger image or image patch. When this upsampling process is spatially-aware, resembling a “backward convolution,” it is commonly referred to as deconvolution (Long et al., 2014).
[...]
To make it possible to transmit incremental information, the design should take into account the fact that image decoding will be progressive. With this design goal in mind, we can consider architectures that are built on top of residuals with the goal of minimizing the residual error in the reconstruction as additional information becomes available to the decoder [...] a varying number of bits per patch by allowing a varying number of iterations of the encoder.
[...]
In our networks, we employ a binarization technique first proposed by Williams (1992), and similar to Krizhevsky & Hinton (2011) and Courbariaux et al. (2015).
[...]
The binarization process consists of two parts. The first part consists of generating the required number of outputs (equal to the desired number of output bits) in the continuous interval [−1, 1]. The second part involves taking this real-valued representation as input and producing a discrete output in the set {−1, 1} for each value. For the first step in the binarization process, we use a fully-connected layer with tanh activations. For the second part, following Raiko et al. (2015) [OZ: stochastic rounding]

[OZ: See very interesting correspondence between...]
3.3 FEED-FORWARD FULLY-CONNECTED RESIDUAL ENCODER
3.4 LSTM-BASED COMPRESSION
"""
https://arxiv.org/abs/1511.06085
https://arxiv.org/abs/1511.06085

Labels:


| |

Home

Powered by Blogger