Show simple item record

dc.contributor.advisorRamakrishnan, A G
dc.contributor.authorPati, Peeta Basa
dc.date.accessioned2025-10-15T11:09:16Z
dc.date.available2025-10-15T11:09:16Z
dc.date.submitted2000
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/7182
dc.description.abstractAutomatic recognition of characters by a machine is one of the challenging problems in Artificial Intelligence. The motivation for the design of such a machine comes from the human visual system (HVS). HVS is endowed with astonishing versatility and constitutes the ultimate physical (albeit neural) realization of a pattern recognition system whose performance is not affected by geometric transformations of patterns, like characters of various styles and sizes. The prime goal of the design of such a machine is to replace the HVS in practical applications involving repetitive, monotonous tasks such as mass digitization of printed manuscripts, processing of letters and mails in postal services, job applications, and banking papers. Most research endeavors and commercial software packages focus on the Roman script. In the case of Indian scripts, the problem of automatic recognition is still a topic of considerable interest. In this thesis, an attempt to develop an integrated Optical Character Recognition (OCR) system for printed Odiya script is presented. The task of automatic recognition of documents has the following major subtasks: Digitization: The process of converting the manuscript hard copies to digital images which can be processed on a computer. Preprocessing: Preprocessing involves noise removal, skew detection and correction, binarization of the gray-valued digital image. Segmentation: This process includes separating the preprocessed image into lines, words, and characters in that hierarchy. Feature Extraction: The attributes of a character, which make it distinct from other characters, are called the features. The process of obtaining them from individual characters is called Feature Extraction. Classification: The extracted features are employed to make a decision on the class to which the test pattern belongs. In this thesis, a novel binarization technique based on windows of variable width is developed and implemented. The width of the window is selected based upon the local statistics of the image. Skew in the document is detected with the help of a two-level precise skew detection algorithm, employing Hough transform and statistical properties of the image. The task of segmenting individual lines from the text is accomplished employing horizontal projection vectors, while that of separating words from lines is done with the help of vertical projection vectors. The segmented words are then subjected to connected component analysis to obtain the basic characters and associated matras. Identifying and extracting the right features with minimal error is one of the most important tasks in automatic recognition of documents. The ability of various types of features in discriminating Odiya characters is analyzed, and the features that exhibit better discriminating capabilities are chosen for use in the recognition phase. Of the tested features, it was found that the projection profiles of the characters yielded better discrimination. Apart from these features, some heuristic-based features are also employed in the final classification phase. An important requirement of pattern classifiers is their robustness to noise in the input patterns. In an attempt to design a robust classifier, various classification techniques reported in the literature are tried. These include the nearest neighbor, k-NN, and modified k-NN classifiers. Apart from these classical pattern classification techniques, modern techniques involving Support Vector Machines (SVMs) are also employed.
dc.language.isoen_US
dc.relation.ispartofseriesT04776
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subjectOdiya Script
dc.subjectSkew Detection and Binarization
dc.subjectPattern Classification SVM, k-NN
dc.titleMachine recognition of printed odiya text
dc.typeThesis
dc.degree.nameMSc Engg
dc.degree.levelMasters
dc.degree.grantorIndian Institute of Science
dc.degree.disciplineEngineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record