Pattern representation and prototype selection for handwritten digit recognition
Abstract
In this work, we present three independent ideas to increase the classification accuracy using the nearest neighbour classifier. These ideas are:
(i) combination of decisions using different representation schemes,
(ii) redundancy removal using bootstrapped patterns, and
(iii) selection of both features and prototypes simultaneously.
The first idea is the combination of classifiers. Here, we propose a method to improve classification accuracy by combining the decisions of various classifiers employing different representation schemes. This is mainly because it is a known fact that pattern representation plays a crucial role in pattern recognition. Therefore, we study the effect of combining the decisions of different representation schemes using a single classifier-the nearest neighbour classifier. There are several combination schemes proposed in the literature that focus on developing new methods for arriving at a final conclusion, given the decisions of individual classifiers. To make the final decision, we used the majority voting scheme, as it is simple and effective.
In our second idea, we propose a method to improve classification accuracy and remove redundancy present in the training dataset using the bootstrapping technique. Redundancy removal is achieved by bootstrapping the original data and removing the boundary patterns, which are interpreted based on the first k nearest neighbours of a given pattern. Additionally, we propose a method to distinguish boundary patterns from non-boundary patterns using the first h nearest neighbours.
In the third method, we propose a way to select both features and prototypes using Genetic Algorithms. In the literature, feature selection has been used to decrease computational time, reduce storage requirements, and increase classification accuracy. Feature selection is achieved by selecting an optimal subset of features from the training dataset. Decrease in storage and computational time can also be achieved by selecting representative patterns from the training dataset, which is well known in the literature as prototype selection. Prototype selection is achieved by selecting an optimal prototype set from the training dataset. Traditionally, prototype selection and feature selection have been treated as two independent optimization problems. Here, we propose a single scheme that can perform both prototype selection and feature selection by optimizing a single criterion function. Our scheme offers the flexibility of selecting optimal feature subsets and prototype sets at the same time.
The results obtained using our approach are observed to be better than those of the above two approaches-both in reducing the dataset size and in retaining classification accuracy. In some cases, the classification accuracy is observed to be increased.

