Oleg Zabluda's blog
Friday, November 11, 2016
 
LipNet: Sentence-level Lipreading (2016) Yannis M. Assael et al
LipNet: Sentence-level Lipreading (2016) Yannis M. Assael et al
"""
More recent deep lipreading approaches are end-to-end trainable (Wand et al., 2016; Chung & Zisserman, 2016a). All existing works, however, perform only word classification, not sentence-level sequence prediction. [...] LipNet, a model that maps a variable-length sequence of video frames to text, making use of spatiotemporal convolutions, an LSTM recurrent network, and the connectionist temporal classification loss, trained entirely end-to-end. To the best of our knowledge, LipNet is the first lipreading model to operate at sentence-level, using a single end-to-end speaker-independent deep model to simultaneously learn spatiotemporal visual features and a sequence model. On the GRID corpus, LipNet achieves 93.4% accuracy, outperforming experienced human lipreaders and the previous 79.6% state-of-the-art accuracy.
"""
https://arxiv.org/abs/1611.01599

https://www.youtube.com/watch?v=fa5QGremQf8
https://www.youtube.com/watch?v=fa5QGremQf8

Labels:


| |

Home

Powered by Blogger