Optimized neural network dichotomizer for speech recognition

Stalin, Suryan.

View/Open

T03532.pdf (32.51Mb)

Author

Stalin, Suryan.

Metadata

Show full item record

Abstract

Among the various neural network architectures and learning algorithms that have emerged recently, multilayer perceptron (MLP) network using backpropagation learning is found most effective for speech recognition due to its ability to form arbitrary complex decision regions in the feature space and the overall versatility of the backpropagation learning algorithm. This thesis addresses the issues of applying MLP to phoneme recognition in continuous speech and proposes modifications to the network architecture and learning algorithms which can give improved performance. Feature representation of speech units, such as phonemes, is complex due to talker variability and contextual variability in continuous speech. This leads to difficulties in a pattern classifier. A typical MLP network classifier would require large number of learning iterations. Poor learning would result in poor network performance. The problem of choosing an optimum size/structure of the MLP network for a given task is an open problem. This thesis proposes (i) a new MLP network architecture which provides faster convergence in the learning phase, (ii) a network pruning algorithm that provides an optimized network structure along with network convergence, and (iii) a method of improving network generalization. These features are shown to be effective in a neural network-based speech recognizer for vowel and semivowel classification in speaker independent continuous speech. An analysis of the slow convergence of the backpropagation algorithm reveals the significance of presenting the output error in vector form and propagating it back vectorically. It is shown that such a mode of propagation in addition to providing faster convergence, also converts the M class recognition problem to M independent two-class recognition (dichotomizer) problems. For speech recognition and character recognition tasks, the new scheme is shown to converge within 20–50% of the iterations required for the MLP to converge and it does not have any degradation in the recognition performance. The convergence and generalization properties of the network are also related to its size/structure. Network pruning is a method in which a larger size network is iteratively reduced to an optimum size. A dynamic link pruning algorithm has been developed, which is incorporated into the backpropagation algorithm so that an optimized network would be obtained along with network convergence. In experiments with speech recognition and character recognition tasks, this optimization is shown to provide a 2–7% improvement in the recognition performance. The network generalization is determined by the feature space partitions formed after learning. A network generalization algorithm is proposed which under suitable conditions, maximizes the volume of the decision regions formed by the network in the feature space. Such a decision region expansion leads to improved test set performance. Incorporating all the features described above, an optimized neural network dichotomizer is developed for vowel and semivowel classification in continuous speech. Results obtained are encouraging compared to the results available in the literature for the talker-independent semivowel classification in continuous speech using the TIMIT database.

URI

https://etd.iisc.ac.in/handle/2005/7292

Collections

Electrical Communication Engineering (ECE) [445]