Adaptive subband coding of audio signals using spectral and temporal masking properites

Satheesh, S

dc.contributor.advisor	Sreenivas T V
dc.contributor.author	Satheesh, S
dc.date.accessioned	2026-03-12T10:46:27Z
dc.date.available	2026-03-12T10:46:27Z
dc.date.submitted	2001
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/9298
dc.description.abstract	Audio coding or audio compression is the technique of reducing the bit rate of digital audio signals, with a view to reducing the bitrate required for transmission or to reduce the storage space. The challenging requirements of lower bitrates and higher quality make audio coding an active research area. In audio signals, bit?rate reduction is achieved through the removal of signal redundancy and perceptually irrelevant parts. Perceptual audio coders exploit the perceptual properties of the human auditory system (HAS) to remove perceptual irrelevancies in the audio signal. The phenomenon of auditory masking is exploited for differential bit allocation such that the quantization noise is rendered inaudible by the signal itself. These coders also achieve redundancy removal by exploiting the statistical correlation in the audio signal. The term perceptual entropy (PE) refers to the minimum rate, in bits per sample, at which a signal can be coded with no perceivable distortion, determined through rigorous psychoacoustic listening tests. Traditional perceptual audio coders use uniform subband decomposition or a fixed orthogonal transform for obtaining a time?frequency mapping of the audio signal. However, the psychoacoustic model used to allocate bits to the subband outputs has a fixed, non?uniform frequency resolution. Given the high degree of spectral and temporal variations associated with audio signals, an ideal coder should adapt its time?frequency decomposition and the corresponding bit allocation optimally. The present research is aimed at finding an optimum time?frequency decomposition to achieve minimum perceptual entropy on a frame?to?frame basis for a variety of audio signals. Different strategies are devised for stationary and transient signals. For stationary signals, the optimum tree decomposition that minimizes the PE is determined from the signal and its psychoacoustic masked threshold. At each node of the tree, the subband filter outputs are quantized using the bit allocation designed to satisfy the condition for transparent encoding. The number of bits required for the transparent encoding of a particular subband is called the subband perceptual entropy (SPE). The tree is traversed from the root node (i.e., the time?domain signal itself) onwards, and a decision to further split a node is taken only if the SPE of the parent node is greater than the sum of the SPEs of the two child nodes. If this criterion is not met, no further splitting is done from that node. The resultant unbalanced binary?tree decomposition is optimum in terms of minimum PE among all subband decompositions up to the maximum depth of the tree. The filterbank itself is constrained by the pair of perfect?reconstruction (PR) analysis/synthesis filter impulse responses. When the audio signal has sharp transients, the quantization noise spreads across the entire transform window length in the case of a transform coder, and for a subband coder, the temporal spreading of quantization noise is governed by the effective length of the impulse response of the synthesis filters. This quantization noise is heard as a significant artifact referred to as pre?echo. Existing methods of pre?echo reduction do not make explicit use of the temporal masking models of the HAS. In this research, a psychoacoustic experiment was conducted to determine the temporal masking threshold pattern of transient signals. The temporal masking threshold pattern is used to implement a temporally varying bit?allocation scheme for encoding the prediction residual. Since closed?loop DPCM is used, it ensures that there is no further spreading of quantization noise. Thus, a switched DPCM/adaptive subband coder is developed, which switches to the DPCM mode when the input frame to be encoded is identified as a transient. Different approaches to the implementation of the adaptive subband coder are explored. The performance of the new coding methods is examined both objectively and subjectively. The subjective evaluation is performed using a small?size, formal psychoacoustic listening test. The objective and subjective test results show that the new coder provides transparent encoding for a variety of audio signals. The performance is also comparable to that of the MPEG?I Layer III coder at equivalent bitrates. The adaptation side information of the coder is encoded using a lossless arithmetic coding technique, and its performance for various symbol definitions is also studied.
dc.language.iso	en_US
dc.relation.ispartofseries	T05107
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Perceptual Audio Coding
dc.subject	Perceptual Entropy
dc.subject	Subband Decomposition
dc.title	Adaptive subband coding of audio signals using spectral and temporal masking properites
dc.type	Thesis
dc.degree.name	MSc Engg
dc.degree.level	Masters
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T05107.pdf
Size:: 3.372Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [518]

Show simple item record