dc.description.abstract | The central theme of this thesis is the study of the effects of costs on the design and operation of statistical pattern recognition schemes. An analysis of Bayesian pattern recognition including feature measurement costs is conducted, and it is shown that the optimal classifier is a multistage decision scheme. It prohibits measuring more than one feature per stage and does not reject classes at intermediate stages.
It is also shown that if the classifier is constrained to reject classes for other practical reasons, the set of individually best two classes for acceptance at any intermediate stage is not necessarily the best set of two classes. This Bayesian problem leads to a decision graph formulation of the optimal classifier. If the feature measurement costs correspond to their computational complexities, the minimum cost scheme is not directly suitable, as the balancing between the feature measurement and classification costs is offset by the complexity of computing decision functions during operation. A heuristic procedure to simplify the optimal scheme to a decision tree is proposed.
However, the design of the minimum cost classifier and its modification to a decision tree are computationally complex. Therefore, a heuristic criterion for discretizing the features is suggested. The solution of the discrete feature problem is carried out by dynamic programming, after explicitly ordering the state space of all outcomes of all feature subsets to suit the requirements of recursive computation. The resulting decision tree is implemented as an optimal binary tree with respect to a general criterion, again by dynamic programming.
Two important problems in speech processing are chosen to illustrate the proposed techniques:
Voiced-Unvoiced-Silence (V-U-S) classification of running speech.
Vowel recognition.
For the V-U-S decision problem, two types of trees are designed:
One by modifying the minimum cost decision regions.
The other by discretizing the features.
Both yield acceptable results, with the latter being much simpler for design.
In the case of problems such as spoken vowel recognition with a large number of correlated features, the derivation of a sufficient statistic of low dimensionality is required prior to the decision tree design. The vowels are modeled as outputs of autoregressive (AR) systems with normally distributed AR parameters, and a new distance measure is derived for their classification. The analysis yields a sufficient statistic of m normalized autocorrelation coefficients, where m is the order of autoregression. Associating equal computational costs for extracting those coefficients, a decision tree is designed after discretization. The resulting tree is operationally very efficient and yields attractive results on experimental data with independent design and test sets.
In medical diagnosis, the degree of certainty with which a disorder is required to be diagnosed bears a direct relationship with the therapeutic modality being considered. The processes of diagnosis and therapy are interlinked and are carried out in multiple stages. At every stage, the disorder probability is updated and that decision (action) which minimizes the overall expected cost is taken. Since the costs of all ordered pairs of class and action depend on several unknown quantities, they are modeled as random variables.
In this context, a mathematical problem of learning the optimum probability threshold for appropriate action is formulated for a two-class, two-action system. A recursive procedure for updating an estimate of the threshold is proposed. The adaptive scheme of using the present threshold estimate for taking action on the next sample pattern is shown to converge in probability to the optimum. Results of a computer simulation study of three learning schemes demonstrate the theoretically predictable salient features of the adaptive scheme. | |