Joint Evaluation Of Multiple Speech Patterns For Speech Recognition And Training

Nair, Nishanth Ulhas

dc.contributor.advisor	Sreenivas, T V
dc.contributor.author	Nair, Nishanth Ulhas
dc.date.accessioned	2009-12-09T10:04:28Z
dc.date.accessioned	2018-07-31T04:49:58Z
dc.date.available	2009-12-09T10:04:28Z
dc.date.available	2018-07-31T04:49:58Z
dc.date.issued	2009-12-09T10:04:28Z
dc.date.submitted	2009
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/630
dc.description.abstract	Improving speech recognition performance in the presence of noise and interference continues to be a challenging problem. Automatic Speech Recognition (ASR) systems work well when the test and training conditions match. In real world environments there is often a mismatch between testing and training conditions. Various factors like additive noise, acoustic echo, and speaker accent, affect the speech recognition performance. Since ASR is a statistical pattern recognition problem, if the test patterns are unlike anything used to train the models, errors are bound to occur, due to feature vector mismatch. Various approaches to robustness have been proposed in the ASR literature contributing to mainly two topics: (i) reducing the variability in the feature vectors or (ii) modify the statistical model parameters to suit the noisy condition. While some of those techniques are quite effective, we would like to examine robustness from a different perspective. Considering the analogy of human communication over telephones, it is quite common to ask the person speaking to us, to repeat certain portions of their speech, because we don't understand it. This happens more often in the presence of background noise where the intelligibility of speech is affected significantly. Although exact nature of how humans decode multiple repetitions of speech is not known, it is quite possible that we use the combined knowledge of the multiple utterances and decode the unclear part of speech. Majority of ASR algorithms do not address this issue, except in very specific issues such as pronunciation modeling. We recognize that under very high noise conditions or bursty error channels, such as in packet communication where packets get dropped, it would be beneficial to take the approach of repeated utterances for robust ASR. In this thesis, we have formulated a set of algorithms for both joint evaluation/decoding for recognizing noisy test utterances as well as utilize the same formulation for selective training of Hidden Markov Models (HMMs), again for robust performance. We first address joint recognition of multiple speech patterns given that they belong to the same class. We formulated this problem considering the patterns as isolated words. If there are K test patterns (K ≥ 2) of a word by a speaker, we show that it is possible to improve the speech recognition accuracy over independent single pattern evaluation of test speech, for the case of both clean and noisy speech. We also find the state sequence which best represents the K patterns. This formulation can be extended to connected word recognition or continuous speech recognition also. Next, we consider the benefits of joint multi-pattern likelihood for HMM training. In the usual HMM training, all the training data is utilized to arrive at a best possible parametric model. But, it is possible that the training data is not all genuine and therefore may have labeling errors, noise corruptions, or plain outlier exemplars. Such outliers will result in poorer models and affect speech recognition performance. So it is important to selectively train them so that the outliers get a lesser weightage. Giving lesser weight to an entire outlier pattern has been addressed before in speech recognition literature. However, it is possible that only some portions of a training pattern are corrupted. So it is important that only the corrupted portions of speech are given a lesser weight during HMM training and not the entire pattern. Since in HMM training, multiple patterns of speech from each class are used, we show that it is possible to use joint evaluation methods to selectively train HMMs such that only the corrupted portions of speech are given a lesser weight and not the entire speech pattern. Thus, we have addressed all the three main tasks of a HMM, to jointly utilize the availability of multiple patterns belonging to the same class. We experimented the new algorithms for Isolated Word Recognition in the case of both clean speech and noisy speech. Significant improvement in speech recognition performance is obtained, especially for speech affected by transient/burst noise.	en
dc.language.iso	en_US	en
dc.relation.ispartofseries	G22616	en
dc.subject	Speech Recognition	en
dc.subject	Robust Speech Recognition	en
dc.subject	Speech Recognition - Algorithms	en
dc.subject	Hidden Markov Models	en
dc.subject	Multi-Pattern Dynamic Time Warping	en
dc.subject	Multi-Pattern Joint Likelihood	en
dc.subject	Multiple Speech Patterns	en
dc.subject	Automatic Speech Recognition (ASR)	en
dc.subject.classification	Computer Science	en
dc.title	Joint Evaluation Of Multiple Speech Patterns For Speech Recognition And Training	en
dc.type	Thesis	en
dc.degree.name	MSc Engg	en
dc.degree.level	Masters	en
dc.degree.discipline	Faculty of Engineering	en

Files in this item

Name:: G22616.pdf
Size:: 3.400Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [470]

Show simple item record

Joint Evaluation Of Multiple Speech Patterns For Speech Recognition And Training

Files in this item

This item appears in the following Collection(s)

Related items

Spectro-Temporal Features For Robust Automatic Speech Recognition ﻿

Language Identification Through Acoustic Sub-Word Units ﻿

Self-Supervised Learning Approaches for Content-Factor Extraction from Raw Speech ﻿

Spectro-Temporal Features For Robust Automatic Speech Recognition

Language Identification Through Acoustic Sub-Word Units

Self-Supervised Learning Approaches for Content-Factor Extraction from Raw Speech