Computer generation and recognition of printed Telugu characters
Abstract
The thesis is primarily concerned with two problems, namely, generation of Telugu characters and recognition of printed Telugu characters.
In addition to the practical utility of generative schemes for pictorial patterns, the problem of generating a large number of structured pictorial patterns is an interesting research problem in itself. In particular, various research workers have concentrated on the generation of characters of different alphabets. In this thesis, a generative scheme for Telugu characters is described; it is a modified version of Shaw's 'Picture Description Language (PDL)'. The basic components of PDL are primitives and concatenation operators. A primitive is a prototype of a basic pattern class, such as a horizontal line. It is described by a name and a tail-head specification.
Primitives can be geometrically concatenated together only at their tail and head points to form intermediate picture entities. The intermediate entities also each have two points at which they may connect to other entities. The restriction that concatenation can take place only at two points of a primitive or intermediate picture fragment results in severe restrictions on the choice of primitives when there are junctions with more number of branches in the class of patterns to be generated and leads to the selection of a larger number of picture primitives. Moreover, PDL assumes that all pictures are connected. This assumption is realized by defining blank (invisible) primitives to provide connectivity between what would otherwise be isolated subpictures. But the use of blank primitives results in quite unwieldy generative rules.
The above problems are alleviated by using a new relational operator, called the 'shift operator', whose sole purpose is to shift the head point of an intermediate picture pattern to any desired location in the area in which the picture is generated.
In addition to the shift operator, the following three relational operators are used to generate Telugu characters:
'Concatenation' operator to combine picture fragments
'Reversal' operator to reverse the head-tail specification of primitives
'Return' operator to return to the head point of the previous intermediate picture pattern at which the current primitive is attached.
The generative scheme has been simulated to generate Telugu characters with fixed attributes. Practical applications for the generative scheme include text editing and poster generation.
In the design and evaluation stages of character recognition systems, a large number of characters are required in order to collect statistical information and to test the efficiency of recognition systems; in such situations, there are several advantages in using artificial, rather than natural, data as a test medium. Hence, as part of the generative scheme, procedures to vary the size, skew, and thickness of characters are discussed.
The central thrust of the work is the presentation of a scheme for the recognition of printed Telugu characters. In view of the large number of characters (well over 2000) in the Telugu alphabet and the requirement that the recognition systems should work irrespective of the size of the characters, the problem is highly formidable. The complexity of the problem is reduced by dividing it into two subproblems. This is made possible by the fact that all the possible Telugu characters can be realized by superimposing certain primitive shapes, called 'build-and-conjunct primitives', over twenty-five 'basic letters'. Hence, a two-stage recognition method has been implemented.
In the first stage of the recognition process, a knowledge-based, goal-oriented search is made to recognize and remove the primitive shapes that are present in the input character. A sequential template matching procedure which makes use of a directed curve-tracer is used for this purpose. After the removal of the primitive shapes, only a basic letter will be left out.
In the second stage, the basic letter is identified by using an on-the-curve coding procedure. It consists of tracing the pattern along its figure points and coding it such that each element in the code corresponds to a curved line segment of a particular orientation. Classification of the basic letter is accomplished by comparing the code of the input pattern with a dictionary of prototype codes of basic letters. The knowledge of primitives and basic letter that are present in a given character is made use of in a decision tree to accomplish the overall classification.
The two-stage recognition scheme has been simulated on IBM 360/44 computer and tested on hand-digitized and computer-generated characters. The experimental performance corroborates the validity of the basic concepts upon which the recognition scheme is based.

