Bimodal Projections Based Features for High Dimensional Pattern Classification
Abstract
Classification tasks involving high-dimensional vectors are affected by the curse of
dimensionality requiring large amount of training data. This is because a high dimensional
space with a modest number of samples is mostly empty. To overcome
this we employ the principle of Projection Pursuit The principle is motivated by the
aim to search for clusters in high-dimensional space. It has been observed that for
most high-dimensional “clouds” of data most low dimensional projections are normal.
This means that projection directions that result in non-gaussian distributions of the
projected data points are rare and hence interesting. Important information in the
data is powerful in these directions and hence they can be used as effective feature
extractors. To achieve this, data points are projected onto an appropriate projection
direction. Search for clusters is in this single dimensional projection space. As a
result inherent sparsity of the high-dimensional space is avoided. The principle concentrates
on projections that allow discrimination between clusters and not faithful
representation of the data. This is in contrast to methods like principal component
analysis which tend to combine features that might have high correlation. Classical
discriminant analysis methods also seek clusters but require class labels to be specified.
One such technique, the Fisher’s Linear Discriminant (FLD) method, has been
used to arrive at an algorithm that seeks bimodal projection directions. Although
the FLD is a supervised classification technique our method of feature extraction is
in fact unsupervised. An artificial two class problem is presented to the FLD by
assigning labels randomly to the points in the data set. The FLD tries to align the
classifier in such a way that the sets of classes are well separated. Thus, it tends to
assist the detection of bimodal projection directions.
The method is able to achieve good dimensionality reduction. Besides, the method
naturally imposes a significant degree of weight sharing on the neural network performing
the feature extraction.
The objective of accurate encoding and the objective of generating discriminating
representations to facilitate pattern classification are not necessarily consistent
with each other. The feature vectors obtained using bimodal projection directions
demonstrate good discriminating powers but are not designed to address the issue
of encoding. Consequently, this feature map is appropriately extended so that the
extended feature map defines an embedding in the feature space. This ensures that
a smooth inverse map exists from the feature space back to the input space. As a result,
the transformation is lossless. Here the dimension of the feature vector is larger
than that of the input vector. However, the number of weight vectors that need to
be stored to affect the feature extraction process is exactly the same as that would
have been required for the feature map without the extension.
The NIST data set for handwritten digits was used to discuss the applicability and
efficiency of the bimodal projection directions based features to handle large, highdimensional
data. The proposed feature extraction method constrains the number
of free parameters (weights) in the network in various ways improving the generalisation
performance of the resulting neural networks. This has been convincingly
demonstrated by the performance of the algorithm on the NIST data set. The fact
that the method is computationally efficient is demonstrated by the fact that although
the resources available were minimal reasonable results could be obtained for a data
set consisting of 3,44,307 training images and 58,646 test images each of size 32 x 32.
The embedding based extension of the feature set is a unified approach to both
data classification and data representation. Although it results in feature vectors that
have dimensions larger than the input space it has been shown using the the popular
wine and vowel benchmarks that very simple classifiers give good performance. So
an increase in the complexity of the classification task due to the increase in the
dimensionality of the input to the classifier is more than compensated by the fact that
a simple classifier with few trainable parameters can achieve the desired classification
performance.

