On Some Questions Involving Prefix Codes

Shivkumar, K M

View/Open

Thesis full text (761.5Kb)

Thesis abstract (114.1Kb)

Author

Shivkumar, K M

Metadata

Show full item record

Abstract

Let A be a finite alphabet and A be the set of all finite sequences of the elements of A. A word is any member of A . A prefix code X is a set of words satisfying the prefix property, i.e., no word in the set is a prefix of any other word in the set. If X is defined as the collection of all concatenations of the words of X, then it can be seen that each of its elements can be expressed as a concatenation of the words of X in a unique manner. Any subset of A possessing this property is called a uniquely decodable code and the prefix codes constitute an important subclass of uniquely decodable codes. In our work, we look into the following questions involving prefix codes: i) We first study a parameter associated with prefix codes. For a discrete source with source distribution P, the problem of constructing a prefix code over the alphabet A with the minimum expected length is one of the earliest problems addressed in information theory. Let LD(P) (D = jAj) denote the minimum expected length of a prefix code for this source. This LD(P) is the parameter of our interest and can be seen as a function—call it the minimum expected length function LD—over the set Pn of all probability mass functions (PMF) of the form (p1; p2; : : : ; pn). It is well known that LD attains its maximum value at the uniform distribution Un = (1=n; 1=n; : : : ; 1=n). However, a characterization of all the PMFs at which this function attains a maximum value has not been carried out before, which we do in this work. This characterization also suggests the following problem: do the restrictions of LD over certain subsets of Pn attain maximum values in their respective domains? If so, what are the PMFs at which these maximum values are attained? We give a partial solution to this problem for the binary case D = 2. ii) We introduce the problem of finding a minimum expected length binary prefix code (hereafter known as an optimal code) among the prefix codes that satisfy the following constraint: all the possible concatenations of the codewords must satisfy the (d; k) runlength-limited (RLL) constraint, i.e., the number of zeros between any two successive ones in them is at least d and the length of any run of consecutive zeros is at most k. For certain (d; k) pairs, we show that this problem can be reduced to a well-studied problem of finding a prefix code with the minimum expected cost when each letter of the alphabet has a non-negative cost associated with it. Also, for these (d; k) pairs, we examine if the optimal codes satisfy a certain maximality property defined with respect to the prefix condition and the RLL constraint. iii) We then study a property of prefix codes: of it being synchronous or not. A prefix code is said to be synchronous if there exists a word x 2 A such that for all w 2 A , we have wx 2 X . Capocelli et al. (1988) have given an algorithm to determine if a given prefix code is synchronous or not, which has subsequently been improved. In our work, we devise an algorithm based on the notion of dangling suffixes, similar to the classical Sardinas-Patterson test for determining whether or not a given code is uniquely decodable. We show that our algorithm has a much better worst-case performance when compared to that 2 of the improved version of the algorithm of Capocelli et al. In this process, we also slightly improve upon the known necessary and sufficient condition for a prefix code to be synchronous. iv) Finally we look into a class of prefix codes called the bifix codes. A bifix code is a prefix code in which no codeword is a suffix of another codeword. For a finite sequence of non-decreasing natural numbers L = (l1; l2; : : : ; ln), there is no known efficient algorithm to determine the existence of a bifix code whose sequence of codeword lengths is the same as L (henceforth referred to as a bifix code for L). For a finite sequence L taking only two distinct integer values (called a two-level sequence), we show that the problem of deciding the existence of a bifix code for L can be converted to a problem of finding a particular subset of vertices from certain graphs derived from de Bruijn graphs. We then conjecture an efficient way of finding these subsets. Ahlswede’s conjecture (1996) is another problem which has led to a lot of work in the field of bifix codes. It states that if a sequence L has a Kraft sum ( P i 2􀀀li ) less than or equal to 3=4, then there exists a binary bifix code for L. This conjecture has been proved when L is a two-level sequence. We give an alternate proof of this by pointing out a new general way of constructing a bifix code for a two-level sequence L with Kraft sum less than or equal to 3=4.

URI

https://etd.iisc.ac.in/handle/2005/4719

Collections

Electrical Communication Engineering (ECE) [492]