Flexible Constraint Length Viterbi Decoders On Large Wire-area Interconnection Topologies
Abstract
To achieve the goal of efficient ”anytime, anywhere” communication, it is essential to develop mobile devices which can efficiently support multiple wireless communication standards. Also, in order to efficiently accommodate the further evolution of these standards, it should be possible to modify/upgrade the operation of the mobile devices without having to recall previously deployed devices. This is achievable if as much functionality of the mobile device as possible is provided through software. A mobile device which fits this description is called a Software Defined Radio (SDR).
Reconfigurable hardware-based solutions are an attractive option for realizing SDRs as they can potentially provide a favourable combination of the flexibility of a DSP or a GPP and the efficiency of an ASIC. The work presented in this thesis discusses the development of efficient reconfigurable hardware for one of the most energy-intensive functionalities in the mobile device, namely, Forward Error Correction (FEC). FEC is required in order to achieve reliable transfer of information at minimal transmit power levels. FEC is achieved by encoding the information in a process called channel coding. Previous studies have shown that the FEC unit accounts for around 40% of the total energy consumption of the mobile unit. In addition, modern wireless standards also place the additional requirement of flexibility on the FEC unit. Thus, the FEC unit of the mobile device represents a considerable amount of computing ability that needs to be accommodated into a very small power, area and energy budget.
Two channel coding techniques have found widespread use in most modern wireless standards -namely convolutional coding and turbo coding. The Viterbi algorithm is most widely used for decoding convolutionally encoded sequences. It is possible to use this algorithm iteratively in order to decode turbo codes. Hence, this thesis specifically focusses on developing architectures for flexible Viterbi decoders. Chapter 2 provides a description of the Viterbi and turbo decoding techniques.
The flexibility requirements placed on the Viterbi decoder by modern standards can be divided into two types -code rate flexibility and constraint length flexibility. The code rate dictates the number of received bits which are handled together as a symbol at the receiver. Hence, code rate flexibility needs to be built into the basic computing units which are used to implement the Viterbi algorithm. The constraint length dictates the number of computations required per received symbol as well as the manner of transfer of results between these computations. Hence, assuming that multiple processing units are used to perform the required computations, supporting constraint length flexibility necessitates changes in the interconnection network connecting the computing units. A constraint length K Viterbi decoder needs 2K−1computations to be performed per received symbol. The results of the computations are exchanged among the computing units in order to prepare for the next received symbol. The communication pattern according to which these results are exchanged forms a graph called a de Bruijn graph, with 2K−1nodes. This implies that providing constraint length flexibility requires being able to realize de Bruijn graphs of various sizes on the interconnection network connecting the processing units.
This thesis focusses on providing constraint length flexibility in an efficient manner. Quite clearly, the topology employed for interconnecting the processing units has a huge effect on the efficiency with which multiple constraint lengths can be supported. This thesis aims to explore the usefulness of interconnection topologies similar to the de Bruijn graph, for building constraint length flexible Viterbi decoders. Five different topologies have been considered in this thesis, which can be discussed under two different headings, as done below:
De Bruijn network-based architectures
The interconnection network that is of chief interest in this thesis is the de Bruijn interconnection network itself, as it is identical to the communication pattern for a Viterbi decoder of a given constraint length. The problem of realizing flexible constraint length Viterbi decoders using a de Bruijn network has been approached in two different ways. The first is an embedding-theoretic approach where the problem of supporting multiple constraint lengths on a de Bruijn network is seen as a problem of embedding smaller sized de Bruijn graphs on a larger de Bruijn graph. Mathematical manipulations are presented to show that this embedding can generally be accomplished with a maximum dilation of, where N is the number of computing nodes in the physical network, while simultaneously avoiding any congestion of the physical links. In this case, however, the mapping of the decoder states onto the processing nodes is assumed fixed.
Another scheme is derived based on a variable assignment of decoder states onto computing nodes, which turns out to be more efficient than the embedding-based approach. For this scheme, the maximum number of cycles per stage is found to be limited to 2 irrespective of the maximum contraint length to be supported. In addition, it is also found to be possible to execute multiple smaller decoders in parallel on the physical network, for smaller constraint lengths. Consequently, post logic-synthesis, this architecture is found to be more area-efficient than the architecture based on the embedding theoretic approach. It is also a more efficiently scalable architecture.
Alternative architectures
There are several interconnection topologies which are closely connected to the de Bruijn graph, and hence could form attractive alternatives for realizing flexbile constraint length Viterbi decoders. We consider two more topologies from this class -namely, the shuffle-exchange network and the flattened butterfly network. The variable state assignment scheme developed for the de Bruijn network is found to be directly applicable to the shuffle-exchange network. The average number of clock cycles per stage is found to be limited to 4 in this case. This is again independent of the constraint length to be supported. On the flattened butterfly (which is actually identical to the hypercube), a state scheduling scheme similar to that of bitonic sorting is used. This architecture is found to offer the ideal throughput of one decoded bit every clock cycle, for any constraint length.
For comparison with a more general purpose topology, we consider a flexible constraint length Viterbi decoder architecture based on a 2D-mesh, which is a popular choice for general purpose applications, as well as many signal processing applications. The state scheduling scheme used here is also similar to that used for bitonic sorting on a mesh.
All the alternative architectures are capable of executing multiple smaller decoders in parallel on the larger interconnection network.
Inferences
Following logic synthesis and power estimation, it is found that the de Bruijn network-based architecture with the variable state assignment scheme yields the lowest (area)−(time) product, while the flattened butterfly network-based architecture yields the lowest (area) - (time)2product. This means, that the de Bruijn network-based architecture is the best choice for moderate throughput applications, while the flattened butterfly network-based architecture is the best choice for high throughput applications. However, as the flattened butterfly network is less scalable in terms of size compared to the de Bruijn network, it can be concluded that among the architectures considered in this thesis, the de Bruijn network-based architecture with the variable state assignment scheme is overall an attractive choice for realizing flexible constraint length Viterbi decoders.