Adaptive subband coding of audio signals using spectral and temporal masking properites

Satheesh, S

View/Open

T05107.pdf (3.372Mb)

Author

Satheesh, S

Metadata

Show full item record

Abstract

Audio coding or audio compression is the technique of reducing the bit rate of digital audio signals, with a view to reducing the bitrate required for transmission or to reduce the storage space. The challenging requirements of lower bitrates and higher quality make audio coding an active research area. In audio signals, bit?rate reduction is achieved through the removal of signal redundancy and perceptually irrelevant parts. Perceptual audio coders exploit the perceptual properties of the human auditory system (HAS) to remove perceptual irrelevancies in the audio signal. The phenomenon of auditory masking is exploited for differential bit allocation such that the quantization noise is rendered inaudible by the signal itself. These coders also achieve redundancy removal by exploiting the statistical correlation in the audio signal. The term perceptual entropy (PE) refers to the minimum rate, in bits per sample, at which a signal can be coded with no perceivable distortion, determined through rigorous psychoacoustic listening tests. Traditional perceptual audio coders use uniform subband decomposition or a fixed orthogonal transform for obtaining a time?frequency mapping of the audio signal. However, the psychoacoustic model used to allocate bits to the subband outputs has a fixed, non?uniform frequency resolution. Given the high degree of spectral and temporal variations associated with audio signals, an ideal coder should adapt its time?frequency decomposition and the corresponding bit allocation optimally. The present research is aimed at finding an optimum time?frequency decomposition to achieve minimum perceptual entropy on a frame?to?frame basis for a variety of audio signals. Different strategies are devised for stationary and transient signals. For stationary signals, the optimum tree decomposition that minimizes the PE is determined from the signal and its psychoacoustic masked threshold. At each node of the tree, the subband filter outputs are quantized using the bit allocation designed to satisfy the condition for transparent encoding. The number of bits required for the transparent encoding of a particular subband is called the subband perceptual entropy (SPE). The tree is traversed from the root node (i.e., the time?domain signal itself) onwards, and a decision to further split a node is taken only if the SPE of the parent node is greater than the sum of the SPEs of the two child nodes. If this criterion is not met, no further splitting is done from that node. The resultant unbalanced binary?tree decomposition is optimum in terms of minimum PE among all subband decompositions up to the maximum depth of the tree. The filterbank itself is constrained by the pair of perfect?reconstruction (PR) analysis/synthesis filter impulse responses. When the audio signal has sharp transients, the quantization noise spreads across the entire transform window length in the case of a transform coder, and for a subband coder, the temporal spreading of quantization noise is governed by the effective length of the impulse response of the synthesis filters. This quantization noise is heard as a significant artifact referred to as pre?echo. Existing methods of pre?echo reduction do not make explicit use of the temporal masking models of the HAS. In this research, a psychoacoustic experiment was conducted to determine the temporal masking threshold pattern of transient signals. The temporal masking threshold pattern is used to implement a temporally varying bit?allocation scheme for encoding the prediction residual. Since closed?loop DPCM is used, it ensures that there is no further spreading of quantization noise. Thus, a switched DPCM/adaptive subband coder is developed, which switches to the DPCM mode when the input frame to be encoded is identified as a transient. Different approaches to the implementation of the adaptive subband coder are explored. The performance of the new coding methods is examined both objectively and subjectively. The subjective evaluation is performed using a small?size, formal psychoacoustic listening test. The objective and subjective test results show that the new coder provides transparent encoding for a variety of audio signals. The performance is also comparable to that of the MPEG?I Layer III coder at equivalent bitrates. The adaptation side information of the coder is encoded using a lossless arithmetic coding technique, and its performance for various symbol definitions is also studied.

URI

https://etd.iisc.ac.in/handle/2005/9298

Collections

Electrical Communication Engineering (ECE) [518]