Show simple item record

dc.contributor.advisorSreenivas, T V
dc.contributor.authorSuryanarayana, Venkata K
dc.date.accessioned2011-01-18T05:36:54Z
dc.date.accessioned2018-07-31T04:50:15Z
dc.date.available2011-01-18T05:36:54Z
dc.date.available2018-07-31T04:50:15Z
dc.date.issued2011-01-18
dc.date.submitted2009
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/1007
dc.description.abstractThe speech signal is inherently characterized by its variations in time, which get reflected as variations in frequency. The specto temporal changes are due to changes in vocaltract, intonation, co-articulation and successive articulation of different phonetic sounds. In this thesis we are looking for improving the speech recognition performance through better feature parameters using a non-stationary model of speech. One effective means of modeling a general non-stationary signal is using the AM-FM model. AM-FM model can be extended to speech through a sub-band analysis, which can be mimic the auditory analysis. In this thesis, we explore new methods for estimating AM and FM parameters based on the non-uniform samples of the signal. The non-uniform sample approach along with adaptive window estimation provides for important advantage because of multi-resolution analysis. We develop several new methods based on ZC intervals, local extrema intervals and signal derivative at ZC’s as different sample measures of the signal and explore their effectiveness for instantaneous frequency (IF) and instantaneous envelope (IE) estimation. To deal with speech signal for automatic speech recognition, we explore the use of auditory motivated spectro temporal information through the use of an auditory filter bank and signal parameters (or features) are derived from the instantaneous energy in each band using the non-linear energy operator over a larger window length. The temporal correlation present in the signal is exploited by using DCT and keeping the lower few coefficients of DCT to keep the trend in the energy in each band. The DCT coefficients from different frequency bands are concatenated together, and a further spectral decorrelation is achieved through KLT (Karhunen-Loeve Transform) of the concatenated feature vector. The changes in the vocaltract are well captured by the change in the formant structure and to emphasize these details for ASR we have defined a temporal formant by using the AM-FM decomposition of sub-band speech. A uniform wideband non-overlaping filters are used for sub-band decomposition. The temporal formant is defined using the AM-FM parameters of each subband signal. The temporal evolution of a formant is represented by the lower order DCT coefficients of the temporal formant in each band and its use for ASR is explored. To address the robustness of ASR performance to environmental noisy conditions, we have used a hybrid approach of enhancing the speech signal using statistical models of the speech and noise. Use of GMM for statistical speech enhancement has been shown to be effective. It is found that the spectro-temporal features derived from enhanced speech provide further improvement to ASR performance.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesG23375en_US
dc.subjectSpeech Processing (Artificial Intelligence)en_US
dc.subjectSpeech Recognitionen_US
dc.subjectSpeech Signal Processingen_US
dc.subjectAutomatic Speech Recognition (ASR)en_US
dc.subjectRobust Speech Recognitionen_US
dc.subjectAmplitude Modulated (AM) And Frequency Modulated (FM) Modelingen_US
dc.subjectAM-FM Modelingen_US
dc.subject.classificationComputer Scienceen_US
dc.titleSpectro-Temporal Features For Robust Automatic Speech Recognitionen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.disciplineFaculty of Engineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record