Optimized neural network dichotomizer for speech recognition

Stalin, Suryan.

dc.contributor.advisor	Sreenivas, T V
dc.contributor.author	Stalin, Suryan.
dc.date.accessioned	2025-10-30T11:06:53Z
dc.date.available	2025-10-30T11:06:53Z
dc.date.submitted	1994
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7292
dc.description.abstract	Among the various neural network architectures and learning algorithms that have emerged recently, multilayer perceptron (MLP) network using backpropagation learning is found most effective for speech recognition due to its ability to form arbitrary complex decision regions in the feature space and the overall versatility of the backpropagation learning algorithm. This thesis addresses the issues of applying MLP to phoneme recognition in continuous speech and proposes modifications to the network architecture and learning algorithms which can give improved performance. Feature representation of speech units, such as phonemes, is complex due to talker variability and contextual variability in continuous speech. This leads to difficulties in a pattern classifier. A typical MLP network classifier would require large number of learning iterations. Poor learning would result in poor network performance. The problem of choosing an optimum size/structure of the MLP network for a given task is an open problem. This thesis proposes (i) a new MLP network architecture which provides faster convergence in the learning phase, (ii) a network pruning algorithm that provides an optimized network structure along with network convergence, and (iii) a method of improving network generalization. These features are shown to be effective in a neural network-based speech recognizer for vowel and semivowel classification in speaker independent continuous speech. An analysis of the slow convergence of the backpropagation algorithm reveals the significance of presenting the output error in vector form and propagating it back vectorically. It is shown that such a mode of propagation in addition to providing faster convergence, also converts the M class recognition problem to M independent two-class recognition (dichotomizer) problems. For speech recognition and character recognition tasks, the new scheme is shown to converge within 20–50% of the iterations required for the MLP to converge and it does not have any degradation in the recognition performance. The convergence and generalization properties of the network are also related to its size/structure. Network pruning is a method in which a larger size network is iteratively reduced to an optimum size. A dynamic link pruning algorithm has been developed, which is incorporated into the backpropagation algorithm so that an optimized network would be obtained along with network convergence. In experiments with speech recognition and character recognition tasks, this optimization is shown to provide a 2–7% improvement in the recognition performance. The network generalization is determined by the feature space partitions formed after learning. A network generalization algorithm is proposed which under suitable conditions, maximizes the volume of the decision regions formed by the network in the feature space. Such a decision region expansion leads to improved test set performance. Incorporating all the features described above, an optimized neural network dichotomizer is developed for vowel and semivowel classification in continuous speech. Results obtained are encouraging compared to the results available in the literature for the talker-independent semivowel classification in continuous speech using the TIMIT database.
dc.language.iso	en_US
dc.relation.ispartofseries	T03532
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Multilayer Perceptron
dc.subject	Backpropagation Learning
dc.subject	Generalization Improvement
dc.title	Optimized neural network dichotomizer for speech recognition
dc.type	Thesis
dc.degree.name	MSc Engg
dc.degree.level	Masters
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T03532.pdf
Size:: 32.51Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [463]

Show simple item record