Attention-Feedback and Representations in OCR

Shiva Kumar, H R

dc.contributor.advisor	Ramakrishnan, A G
dc.contributor.author	Shiva Kumar, H R
dc.date.accessioned	2021-03-11T06:16:02Z
dc.date.available	2021-03-11T06:16:02Z
dc.date.submitted	2019
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4957
dc.description.abstract	A Kannada OCR, named Lipi Gnani, has been designed and developed from scratch, with the motivation of it being able to convert printed text or poetry in Kannada script, without any restriction on vocabulary. The training and test sets have been collected from over 35 books published between the period 1970 to 2002, and this includes books written in Halegannada and pages containing Sanskrit slokas written in Kannada script. The coverage of the OCR is nearly complete in the sense that it recognizes all the punctuation marks, special symbols, Indo-Arabic and Kannada numerals and also the interspersed English words. Several minor and major original contributions have been done in developing this OCR at the different processing stages such as binarization, line and character segmentation, recognition and Unicode mapping. This has created a Kannada OCR that performs as good as, and in some cases, better than the Google’s Tesseract OCR, as shown by the results. To the knowledge of the authors, this is the maiden report of a complete Kannada OCR, handling all the issues involved. Currently, there is no dictionary based postprocessing, and the obtained results are due solely to the recognition process. Four benchmark test datasets containing scanned pages from books in Kannada, Sanskrit, Konkani and Tulu languages, but all of them printed in Kannada script, have been created, along with the ground truth in Unicode. The word level recognition accuracy of Lipi Gnani is 5.3% higher on the Kannada dataset than that of Google’s Tesseract OCR, 8.5% higher on the Sanskrit dataset, and 23.4% higher on the datasets of Konkani and Tulu. Inspired by the rich feedback that exists in the visual neural pathway that is active during the recognition process, we have proposed the use of feedback from the latter modules in the OCR workflow, such as recognition and Unicode generation, to the earlier stages such as binarization and segmentation, to result in the overall improvement of the performance of the OCR on old documents. The system looks for singularities and inconsistencies in the sequence of recognition labels for each word image, and their recognition scores output by the classifier, and based on these indicators, suspects merged or split characters or interspersed English words. A nonlinear, locally adaptive, enhancement method is then applied on the original, segmented gray level image of the word, and implemented in a slightly different manner for handling merged and split characters. Multiple images of the word, enhanced to different extents, are binarized and segmented into symbols and the best enhanced image is chosen based on the best overall recognition score for the word. If the anomaly still persists, the system suspects the word image to be of an interspersed English word, in an otherwise Kannada document. The segmented components of the word are now rerecognized as English characters by a different classifier, trained on the Latin script. The effectiveness of the proposed attention-feedback processing has been thoroughly tested on a challenging dataset of 250 pages of Kannada, which also include some Halegannada pages. It has also been tested on three other datasets containing 40+ pages of Tulu, Konkani and Sanskrit text, printed in Kannada script. The overall attention feedback processing results in an improvement in the word level recognition accuracy of 4.56% on the Kannada dataset, 2.4% on Tulu and Konkani datasets and 6.3% on the Sanskrit dataset. We have also proposed an elegant and unique algorithm for the segmentation of text-lines from iii Abstract printed and handwritten documents, using Red-Black Tree and Bipartite Graph Representation. We first represent each connected component (CC) in a document page as a row interval and then exploit the properties of the red-black tree (RBT) data structure in collecting the appropriate intervals (CC) in the different nodes (text-lines) of the tree. We initially construct an RBT by inserting the row-intervals of all the mid-sized connected components into the tree. While inserting an interval, we recursively merge all the intervals that have significant overlap into a single enclosing interval. Tall CCs, which may arise due to the touching of components from adjacent lines, are inserted into the tree after cutting if needed. Non-overlapping short components, which may include diacritical marks, are considered last, and inserted into the closest intervals. Once all the CCs of the document page are inserted, the RBT has one node for each segmented text-line and we do in-order tree traversal to get the lines in the sorted order. The algorithm is computationally efficient, since each CC is processed only once in creating the tree and the time complexity of RBT search/edit operations is of the order of the logarithm of the number of lines. We have thoroughly tested our Red-Black Tree and Bipartite Graph based line segmentation algorithm on many standard datasets. The Results on ICDAR-2013 Handwriting-Segmentation-Contest dataset (English, Greek, Bangla) show that our approach marginally outperforms the state-of-the-art text-line segmentation methods reported on this dataset. Results on ICDAR-2009 and PBOK datasets (French, German, Kannada, Oriya) show that it also scales to these Indic and European languages. We have also developed an intuitive user-friendly GUI for OCR, called PrintToBraille. This Print- ToBraille GUI has facility to recognize individual scanned pages or all the pages of an entire book. The latter facility was specifically added to help the NGO’s to create Braille versions of school texts for the use of blind children. Thus, the output text of the OCR can be saved in .rtf, .xml or braille format. It also has provision to save the recognized Unicode text, and the line and word boundaries in the industry standard METS/ALTO XML format. The Lipi Gnani Kannada OCR and the PrintToBraille GUI, both developed in Java, can be run on Windows, Linux and Mac operating systems. A setup/installer program has also been made available for Windows users to ease the installation and running.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29786
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Lipi Gnani Kannada OCR	en_US
dc.subject	PrintToBraille GUI	en_US
dc.subject	Kannada lipi OCR	en_US
dc.subject	OCR	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology	en_US
dc.title	Attention-Feedback and Representations in OCR	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G29786.pdf
Size:: 7.976Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [398]

Show simple item record