• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Machine recognition of printed odiya text

    Thumbnail
    View/Open
    T04776.pdf (21.65Mb)
    Author
    Pati, Peeta Basa
    Metadata
    Show full item record
    Abstract
    Automatic recognition of characters by a machine is one of the challenging problems in Artificial Intelligence. The motivation for the design of such a machine comes from the human visual system (HVS). HVS is endowed with astonishing versatility and constitutes the ultimate physical (albeit neural) realization of a pattern recognition system whose performance is not affected by geometric transformations of patterns, like characters of various styles and sizes. The prime goal of the design of such a machine is to replace the HVS in practical applications involving repetitive, monotonous tasks such as mass digitization of printed manuscripts, processing of letters and mails in postal services, job applications, and banking papers. Most research endeavors and commercial software packages focus on the Roman script. In the case of Indian scripts, the problem of automatic recognition is still a topic of considerable interest. In this thesis, an attempt to develop an integrated Optical Character Recognition (OCR) system for printed Odiya script is presented. The task of automatic recognition of documents has the following major subtasks: Digitization: The process of converting the manuscript hard copies to digital images which can be processed on a computer. Preprocessing: Preprocessing involves noise removal, skew detection and correction, binarization of the gray-valued digital image. Segmentation: This process includes separating the preprocessed image into lines, words, and characters in that hierarchy. Feature Extraction: The attributes of a character, which make it distinct from other characters, are called the features. The process of obtaining them from individual characters is called Feature Extraction. Classification: The extracted features are employed to make a decision on the class to which the test pattern belongs. In this thesis, a novel binarization technique based on windows of variable width is developed and implemented. The width of the window is selected based upon the local statistics of the image. Skew in the document is detected with the help of a two-level precise skew detection algorithm, employing Hough transform and statistical properties of the image. The task of segmenting individual lines from the text is accomplished employing horizontal projection vectors, while that of separating words from lines is done with the help of vertical projection vectors. The segmented words are then subjected to connected component analysis to obtain the basic characters and associated matras. Identifying and extracting the right features with minimal error is one of the most important tasks in automatic recognition of documents. The ability of various types of features in discriminating Odiya characters is analyzed, and the features that exhibit better discriminating capabilities are chosen for use in the recognition phase. Of the tested features, it was found that the projection profiles of the characters yielded better discrimination. Apart from these features, some heuristic-based features are also employed in the final classification phase. An important requirement of pattern classifiers is their robustness to noise in the input patterns. In an attempt to design a robust classifier, various classification techniques reported in the literature are tried. These include the nearest neighbor, k-NN, and modified k-NN classifiers. Apart from these classical pattern classification techniques, modern techniques involving Support Vector Machines (SVMs) are also employed.
    URI
    https://etd.iisc.ac.in/handle/2005/7182
    Collections
    • Electrical Engineering (EE) [392]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV