Machine recognition of printed kannada text
Abstract
Optical Character Recognition (OCR) provides a more efficient and faster solution for many real-world problems such as form readers for various application forms, bank cheque reading, and postal mail sorting. Manual operation for these applications is labor-intensive and time-consuming. Most research efforts in OCR and commercial software packages focus on the Roman script. However, for Indian scripts, automatic recognition remains an unsolved and challenging problem. This thesis presents the design of a full-fledged OCR system for printed Kannada text.
Machine recognition of Kannada characters is difficult due to similarity in character shapes, script complexity, and non-uniqueness in diacritic representation, even in printed text.
OCR consists of the following subtasks:
Preprocessing: Enhancing desirable properties of the text document, including binarization, skew detection, and correction.
Segmentation: Determining the bounding boxes of individual characters.
Feature Extraction: Extracting information from character images that is most relevant for classification.
Classification: Assigning class labels to feature vectors from a predefined set of classes.
In this research, existing algorithms were implemented for these tasks, with modifications suggested to improve system performance. A dataset containing over 40,000 Kannada characters of various font styles and sizes was collected from scanned magazine images. Three training sets were used:
Basic characters (vowels + consonants): 37 classes, 30 samples per class.
Vowel modifiers: 8 classes, 20 samples per class.
Consonant conjuncts: 27 classes, 20 samples per class.
Key contributions include:
A connectivity-preserving binarization technique that combines global and local adaptive processing using connectivity as a measure.
A skew correction method that rotates the original grayscale image before binarization, minimizing visual distortion.
A novel segmentation scheme for Kannada characters using horizontal and vertical projections, morphological dilation, baseline estimation, and connected component analysis.
Various features were evaluated for character recognition, including:
Transform-based features: Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Karhunen-Loeve Transform (KLT).
Structural features: Used to distinguish similar/confusing characters via a three-level hierarchical classifier.
Classifiers used include:
Nearest Neighbor (NN)
Artificial Neural Networks (ANN): Backpropagation Network (BPN) and Radial Basis Function Network (RBFN)
The system was tested on over 1,400 characters with varied fonts and sizes. Combining structural and transform features improved recognition rates from 92% to 98%. Among wavelet features, Haar wavelets achieved the highest recognition rate of 98.8% using NN. DCT and KLT achieved 97% and 98%, respectively. RBFN achieved 98.9%, while BPN reached 97% accuracy.