Spectro-Temporal Features For Robust Automatic Speech Recognition

Suryanarayana, Venkata K

dc.contributor.advisor	Sreenivas, T V
dc.contributor.author	Suryanarayana, Venkata K
dc.date.accessioned	2011-01-18T05:36:54Z
dc.date.accessioned	2018-07-31T04:50:15Z
dc.date.available	2011-01-18T05:36:54Z
dc.date.available	2018-07-31T04:50:15Z
dc.date.issued	2011-01-18
dc.date.submitted	2009
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/1007
dc.description.abstract	The speech signal is inherently characterized by its variations in time, which get reflected as variations in frequency. The specto temporal changes are due to changes in vocaltract, intonation, co-articulation and successive articulation of different phonetic sounds. In this thesis we are looking for improving the speech recognition performance through better feature parameters using a non-stationary model of speech. One effective means of modeling a general non-stationary signal is using the AM-FM model. AM-FM model can be extended to speech through a sub-band analysis, which can be mimic the auditory analysis. In this thesis, we explore new methods for estimating AM and FM parameters based on the non-uniform samples of the signal. The non-uniform sample approach along with adaptive window estimation provides for important advantage because of multi-resolution analysis. We develop several new methods based on ZC intervals, local extrema intervals and signal derivative at ZC’s as different sample measures of the signal and explore their effectiveness for instantaneous frequency (IF) and instantaneous envelope (IE) estimation. To deal with speech signal for automatic speech recognition, we explore the use of auditory motivated spectro temporal information through the use of an auditory filter bank and signal parameters (or features) are derived from the instantaneous energy in each band using the non-linear energy operator over a larger window length. The temporal correlation present in the signal is exploited by using DCT and keeping the lower few coefficients of DCT to keep the trend in the energy in each band. The DCT coefficients from different frequency bands are concatenated together, and a further spectral decorrelation is achieved through KLT (Karhunen-Loeve Transform) of the concatenated feature vector. The changes in the vocaltract are well captured by the change in the formant structure and to emphasize these details for ASR we have defined a temporal formant by using the AM-FM decomposition of sub-band speech. A uniform wideband non-overlaping filters are used for sub-band decomposition. The temporal formant is defined using the AM-FM parameters of each subband signal. The temporal evolution of a formant is represented by the lower order DCT coefficients of the temporal formant in each band and its use for ASR is explored. To address the robustness of ASR performance to environmental noisy conditions, we have used a hybrid approach of enhancing the speech signal using statistical models of the speech and noise. Use of GMM for statistical speech enhancement has been shown to be effective. It is found that the spectro-temporal features derived from enhanced speech provide further improvement to ASR performance.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G23375	en_US
dc.subject	Speech Processing (Artificial Intelligence)	en_US
dc.subject	Speech Recognition	en_US
dc.subject	Speech Signal Processing	en_US
dc.subject	Automatic Speech Recognition (ASR)	en_US
dc.subject	Robust Speech Recognition	en_US
dc.subject	Amplitude Modulated (AM) And Frequency Modulated (FM) Modeling	en_US
dc.subject	AM-FM Modeling	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Spectro-Temporal Features For Robust Automatic Speech Recognition	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US