Optimized neural network dichotomizer for speech recognition
Abstract
Among the various neural network architectures and learning algorithms that
have emerged recently, multilayer perceptron (MLP) network using backpropagation
learning is found most effective for speech recognition due to its ability to form arbitrary
complex decision regions in the feature space and the overall versatility of the
backpropagation learning algorithm. This thesis addresses the issues of applying MLP to
phoneme recognition in continuous speech and proposes modifications to the network
architecture and learning algorithms which can give improved performance.
Feature representation of speech units, such as phonemes, is complex due to
talker variability and contextual variability in continuous speech. This leads to difficulties
in a pattern classifier. A typical MLP network classifier would require large number of
learning iterations. Poor learning would result in poor network performance. The problem
of choosing an optimum size/structure of the MLP network for a given task is an open
problem. This thesis proposes (i) a new MLP network architecture which provides faster
convergence in the learning phase, (ii) a network pruning algorithm that provides an
optimized network structure along with network convergence, and (iii) a method of
improving network generalization. These features are shown to be effective in a neural
network-based speech recognizer for vowel and semivowel classification in speaker
independent continuous speech.
An analysis of the slow convergence of the backpropagation algorithm reveals
the significance of presenting the output error in vector form and propagating it back
vectorically. It is shown that such a mode of propagation in addition to providing faster
convergence, also converts the M class recognition problem to M independent two-class
recognition (dichotomizer) problems. For speech recognition and character recognition
tasks, the new scheme is shown to converge within 20–50% of the iterations required for
the MLP to converge and it does not have any degradation in the recognition
performance.
The convergence and generalization properties of the network are also related
to its size/structure. Network pruning is a method in which a larger size network is
iteratively reduced to an optimum size. A dynamic link pruning algorithm has been
developed, which is incorporated into the backpropagation algorithm so that an optimized
network would be obtained along with network convergence. In experiments with speech
recognition and character recognition tasks, this optimization is shown to provide a 2–7%
improvement in the recognition performance. The network generalization is determined
by the feature space partitions formed after learning. A network generalization algorithm
is proposed which under suitable conditions, maximizes the volume of the decision regions
formed by the network in the feature space. Such a decision region expansion leads to
improved test set performance.
Incorporating all the features described above, an optimized neural network
dichotomizer is developed for vowel and semivowel classification in continuous speech.
Results obtained are encouraging compared to the results available in the literature for the
talker-independent semivowel classification in continuous speech using the TIMIT
database.

