Visual Speech Recognition

Jain, Abhilash

View/Open

Thesis full text (7.665Mb)

Author

Jain, Abhilash

Metadata

Show full item record

Abstract

Visual speech recognition (VSR), or automatic lip-reading, is the task of extracting speech information from visual input. The addition of visual speech has been shown to improve the performance of traditional audio speech recognition (ASR) systems, and hence has been active area of research since it's inception. This thesis proposes a new VSR system for isolated word recognition tasks, with focus on the feature extraction methodology. A novel two-stage feature extraction technique is proposed. Image transform based features { discrete cosine transform (DCT) and local binary patterns (LBP) { are used. The use of di erence images for temporal feature extraction is also proposed. A new region of interest (ROI), which consists of the throat and lower jaw along with the mouth, is also introduced. For ROI extraction, the Viola-Jones algorithm is used. Classi cation is done using a multi-class Support Vector Machine (SVM) model. The system provides a simple, yet effective way to extract features from the video input, and performs comparably to some recent VSR systems, which employ more complicated techniques, like lip modelling or deep learning, to extract visual features.

URI

https://etd.iisc.ac.in/handle/2005/4767

Collections

Electrical Engineering (EE) [361]