Devanagari Online Handwritten Character Recognition
Abstract
In this thesis, a classifier based on local sub-unit level and global character level representations
of a character, using stroke direction and order variations independent features, is developed
for recognition of Devanagari online handwritten characters.
It is shown that online character corresponding to Devanagari ideal character can be analyzed
and uniquely represented in terms of homogeneous sub-structures called the sub-units. These
sub-units can be extracted using direction property of online strokes in an ideal character. A
method for extraction of sub-units from a handwritten character is developed, such that the
extracted sub-units are similar to the sub-units of the corresponding ideal character.
Features are developed that are independent of variations in order and direction of strokes in
characters. The features are called histograms of points, orientations, and dynamics of orientations
(HPOD) features. The method for extraction of these features spatially maps co-ordinates
of points and orientations and dynamics of orientations of strokes at these points. Histograms
of these mapped features are computed in di erent regions into which the spatial map is divided.
HPOD features extracted from the sub-units represent the character locally; and those
extracted from the character as a whole represent it globally.
A classifier is developed that models handwritten characters in terms of the joint distribution
of the local and global HPOD features of the characters and the number of sub-units in the
characters. The classifier uses latent variables to model the structure of the the sub-units.
The parameters of the model are estimated using the maximum likelihood method. The use
of HPOD features and the assumption of independent generation of the sub-units given the
number of sub-units, make the classifier independent of variations in the direction and order of
strokes in characters. This sub-unit based classifier is called SUB classifier.
Datasets for training and testing the classifiers consist of handwritten samples of Devanagari
vowels, consonants, half consonants, nasalization sign, vowel omission sign, vowel signs,
consonant with vowel sign, conjuncts, consonant clusters, and three more short strokes with
di erent shapes. In all, there are 96 di erent characters or symbols that have been considered
for recognition. The average number of samples per character class in the training and the test
sets are, respectively, 133 and 29. The smallest and the largest dimensions of the extracted
feature vectors are, respectively, 258 and 786. Since the size of the training set per class is not
large compared to the dimension of the extracted feature vectors, the training set is small from
the perspective of training any classifier. classifiers that can be trained on a small data set are
considered for performance comparison with the developed classifier.
Second order statistics (SOS), sub-space (SS), Fisher discriminant (FD), feedforward neural
network (FNN), and support vector machines (SVM) are the other classifiers considered
that are trained with the other features like spatio-temporal (ST), discrete Fourier transform
(DFT), discrete cosine transform (DCT), discrete wavelet transform (DWT), spatial (SP), and
histograms of oriented gradients (HOG) features extracted from the samples of the training set.
These classifiers are tested with these features extracted from the samples of the test set. SVM
classifier trained with DFT features has the highest accuracy of 90.2% among the accuracies of
the other classifiers trained with the other features extracted from the test set. The accuracy
of SUB classifier trained with HPOD features is 92.9% on the test set which is the highest
among the accuracies of all the classifiers. The accuracies of the classifiers SOS, SS, FD, FNN,
and SVM increase when trained with HPOD features. The accuracy of SVM classifier trained
with HPOD features is 92.9%, which is the highest among the accuracies of the other classifiers
trained with HPOD features.
SUB classifier using HPOD features has the highest accuracy among the considered classifiers
trained with the considered features on the same training set and tested on the same test set.
The better character discriminative capability of the designed HPOD features is re
ected by
the increase in the accuracies of the other classifiers when trained with these features