Bimodal Projections Based Features for High Dimensional Pattern Classification

Deodhare, Dipti

View/Open

T05131.pdf (65.11Mb)

Author

Deodhare, Dipti

Metadata

Show full item record

Abstract

Classification tasks involving high-dimensional vectors are affected by the curse of dimensionality requiring large amount of training data. This is because a high dimensional space with a modest number of samples is mostly empty. To overcome this we employ the principle of Projection Pursuit The principle is motivated by the aim to search for clusters in high-dimensional space. It has been observed that for most high-dimensional “clouds” of data most low dimensional projections are normal. This means that projection directions that result in non-gaussian distributions of the projected data points are rare and hence interesting. Important information in the data is powerful in these directions and hence they can be used as effective feature extractors. To achieve this, data points are projected onto an appropriate projection direction. Search for clusters is in this single dimensional projection space. As a result inherent sparsity of the high-dimensional space is avoided. The principle concentrates on projections that allow discrimination between clusters and not faithful representation of the data. This is in contrast to methods like principal component analysis which tend to combine features that might have high correlation. Classical discriminant analysis methods also seek clusters but require class labels to be specified. One such technique, the Fisher’s Linear Discriminant (FLD) method, has been used to arrive at an algorithm that seeks bimodal projection directions. Although the FLD is a supervised classification technique our method of feature extraction is in fact unsupervised. An artificial two class problem is presented to the FLD by assigning labels randomly to the points in the data set. The FLD tries to align the classifier in such a way that the sets of classes are well separated. Thus, it tends to assist the detection of bimodal projection directions. The method is able to achieve good dimensionality reduction. Besides, the method naturally imposes a significant degree of weight sharing on the neural network performing the feature extraction. The objective of accurate encoding and the objective of generating discriminating representations to facilitate pattern classification are not necessarily consistent with each other. The feature vectors obtained using bimodal projection directions demonstrate good discriminating powers but are not designed to address the issue of encoding. Consequently, this feature map is appropriately extended so that the extended feature map defines an embedding in the feature space. This ensures that a smooth inverse map exists from the feature space back to the input space. As a result, the transformation is lossless. Here the dimension of the feature vector is larger than that of the input vector. However, the number of weight vectors that need to be stored to affect the feature extraction process is exactly the same as that would have been required for the feature map without the extension. The NIST data set for handwritten digits was used to discuss the applicability and efficiency of the bimodal projection directions based features to handle large, highdimensional data. The proposed feature extraction method constrains the number of free parameters (weights) in the network in various ways improving the generalisation performance of the resulting neural networks. This has been convincingly demonstrated by the performance of the algorithm on the NIST data set. The fact that the method is computationally efficient is demonstrated by the fact that although the resources available were minimal reasonable results could be obtained for a data set consisting of 3,44,307 training images and 58,646 test images each of size 32 x 32. The embedding based extension of the feature set is a unified approach to both data classification and data representation. Although it results in feature vectors that have dimensions larger than the input space it has been shown using the the popular wine and vowel benchmarks that very simple classifiers give good performance. So an increase in the complexity of the classification task due to the increase in the dimensionality of the input to the classifier is more than compensated by the fact that a simple classifier with few trainable parameters can achieve the desired classification performance.

URI

https://etd.iisc.ac.in/handle/2005/7202

Collections

Computer Science and Automation (CSA) [531]