Machine recognition of printed odiya text

Pati, Peeta Basa

dc.contributor.advisor	Ramakrishnan, A G
dc.contributor.author	Pati, Peeta Basa
dc.date.accessioned	2025-10-15T11:09:16Z
dc.date.available	2025-10-15T11:09:16Z
dc.date.submitted	2000
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7182
dc.description.abstract	Automatic recognition of characters by a machine is one of the challenging problems in Artificial Intelligence. The motivation for the design of such a machine comes from the human visual system (HVS). HVS is endowed with astonishing versatility and constitutes the ultimate physical (albeit neural) realization of a pattern recognition system whose performance is not affected by geometric transformations of patterns, like characters of various styles and sizes. The prime goal of the design of such a machine is to replace the HVS in practical applications involving repetitive, monotonous tasks such as mass digitization of printed manuscripts, processing of letters and mails in postal services, job applications, and banking papers. Most research endeavors and commercial software packages focus on the Roman script. In the case of Indian scripts, the problem of automatic recognition is still a topic of considerable interest. In this thesis, an attempt to develop an integrated Optical Character Recognition (OCR) system for printed Odiya script is presented. The task of automatic recognition of documents has the following major subtasks: Digitization: The process of converting the manuscript hard copies to digital images which can be processed on a computer. Preprocessing: Preprocessing involves noise removal, skew detection and correction, binarization of the gray-valued digital image. Segmentation: This process includes separating the preprocessed image into lines, words, and characters in that hierarchy. Feature Extraction: The attributes of a character, which make it distinct from other characters, are called the features. The process of obtaining them from individual characters is called Feature Extraction. Classification: The extracted features are employed to make a decision on the class to which the test pattern belongs. In this thesis, a novel binarization technique based on windows of variable width is developed and implemented. The width of the window is selected based upon the local statistics of the image. Skew in the document is detected with the help of a two-level precise skew detection algorithm, employing Hough transform and statistical properties of the image. The task of segmenting individual lines from the text is accomplished employing horizontal projection vectors, while that of separating words from lines is done with the help of vertical projection vectors. The segmented words are then subjected to connected component analysis to obtain the basic characters and associated matras. Identifying and extracting the right features with minimal error is one of the most important tasks in automatic recognition of documents. The ability of various types of features in discriminating Odiya characters is analyzed, and the features that exhibit better discriminating capabilities are chosen for use in the recognition phase. Of the tested features, it was found that the projection profiles of the characters yielded better discrimination. Apart from these features, some heuristic-based features are also employed in the final classification phase. An important requirement of pattern classifiers is their robustness to noise in the input patterns. In an attempt to design a robust classifier, various classification techniques reported in the literature are tried. These include the nearest neighbor, k-NN, and modified k-NN classifiers. Apart from these classical pattern classification techniques, modern techniques involving Support Vector Machines (SVMs) are also employed.
dc.language.iso	en_US
dc.relation.ispartofseries	T04776
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Odiya Script
dc.subject	Skew Detection and Binarization
dc.subject	Pattern Classification SVM, k-NN
dc.title	Machine recognition of printed odiya text
dc.type	Thesis
dc.degree.name	MSc Engg
dc.degree.level	Masters
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T04776.pdf
Size:: 21.65Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [451]

Show simple item record