Compressed Domain Processing of  MPEG Audio

Anantharaman, B

dc.contributor.advisor	Ramakrishnan, K R
dc.contributor.author	Anantharaman, B
dc.date.accessioned	2005-02-11T06:36:18Z
dc.date.accessioned	2018-07-31T07:05:42Z
dc.date.available	2005-02-11T06:36:18Z
dc.date.available	2018-07-31T07:05:42Z
dc.date.issued	2005-02-11T06:36:18Z
dc.date.submitted	2001
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3914
dc.identifier.srno	null
dc.description.abstract	MPEG audio compression techniques significantly reduces the storage and transmission requirements for high quality digital audio. However, compression complicates the processing of audio in many applications. If a compressed audio signal is to be processed, a direct method would be to decode the compressed signal, process the decoded signal and re-encode it. This is computationally expensive due to the complexity of the MPEG filter bank. This thesis deals with processing of MPEG compressed audio. The main contributions of this thesis are a) Extracting wavelet coefficients in the MPEG compressed domain. b) Wavelet based pitch extraction in MPEG compressed domain. c) Time Scale Modifications of MPEG audio. d) Watermarking of MPEG audio. The research contributions starts with a technique for calculating several levels of wavelet coefficients from the output of the MPEG analysis filter bank. The technique exploits the toeplitz structure which arises when the MPEG and wavelet filter banks are represented in a matrix form, The computational complexity for extracting several levels of wavelet coefficients after decoding the compressed signal and directly from the output of the MPEG analysis filter bank are compared. The proposed technique is found to be computationally efficient for extracting higher levels of wavelet coefficients. Extracting pitch in the compressed domain becomes essential when large multimedia databases need to be indexed. For example one may be interested in listening to a particular speaker or to listen to male female audio segments in a multimedia document. For this application, pitch information is one of the very basic and important features required. Pitch is basically the time interval between two successive glottal closures. Glottal closures are accompanied by sharp transients in the speech signal which in turn gives rise to a local maxima in the wavelet coefficients. Pitch can be calculated by finding the time interval between two successive maxima in the wavelet coefficients. It is shown that the computational complexity for extracting pitch in the compressed domain is less than 7% of the uncompressed domain processing. An algorithm for extracting pitch in the compressed domain is proposed. The result of this algorithm for synthetic signals, and utterances of words by male/female is reported. In a number of important applications, one needs to modify an audio signal to render it more useful than its original. Typical applications include changing the time evolution of an audio signal (increase or decrease the rate of articulation of a speaker),or to adapt a given audio sequence to a given video sequence. In this thesis, time scale modifications are obtained in the subband domain such that when the modified subband signals are given to the MPEG synthesis filter bank, the desired time scale modification of the decoded signal is achieved. This is done by making use of sinusoidal modeling [I]. Here, each of the subband signal is modeled in terms of parameters such as amplitude phase and frequencies and are subsequently synthesised by using these parameters with Ls = k La where Ls is the length of the synthesis window , k is the time scale factor and La is the length of the analysis window. As the PCM version of the time scaled signal is not available, psychoacoustic model based bit allocation cannot be used. Hence a new bit allocation is done by using a subband coding algorithm. This method has been satisfactorily tested for time scale expansion and compression of speech and music signals. The recent growth of multimedia systems has increased the need for protecting digital media. Digital watermarking has been proposed as a method for protecting digital documents. The watermark needs to be added to the signal in such a way that it does not cause audible distortions. However the idea behind the lossy MPEC encoders is to remove or make insignificant those portions of the signal which does not affect human hearing. This renders the watermark insignificant and hence proving ownership of the signal becomes difficult when an audio signal is compressed. The existing compressed domain methods merely change the bits or the scale factors according to a key. Though simple, these methods are not robust to attacks. Further these methods require original signal to be available in the verification process. In this thesis we propose a watermarking method based on spread spectrum technique which does not require original signal during the verification process. It is also shown to be more robust than the existing methods. In our method the watermark is spread across many subband samples. Here two factors need to be considered, a) the watermark is to be embedded only in those subbands which will make the addition of the noise inaudible. b) The watermark should be added to those subbands which has sufficient bit allocation so that the watermark does not become insignificant due to lack of bit allocation. Embedding the watermark in the lower subbands would cause distortion and in the higher subbands would prove futile as the bit allocation in these subbands are practically zero. Considering a11 these factors, one can introduce noise to samples across many frames corresponding to subbands 4 to 8. In the verification process, it is sufficient to have the key/code and the possibly attacked signal. This method has been satisfactorily tested for robustness to scalefactor, LSB change and MPEG decoding and re-encoding.	en
dc.format.extent	2757785 bytes
dc.format.mimetype	application/pdf
dc.language.iso	en
dc.publisher	Indian Institute of Science	en
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation.	en
dc.subject.classification	Electrical Communications	en
dc.subject.keyword	MPEG Audio Coding Digital Technique	en
dc.subject.keyword	Audio Signal Processing	en
dc.subject.keyword	Least Significant Bit (LSB)	en
dc.subject.keyword	Audio Signals Compression	en
dc.subject.keyword	Wavelet Coefficients	en
dc.subject.keyword	Time Scale Modifications	en
dc.subject.keyword	Sinusoidal Model	en
dc.subject.keyword	Compressed Domain	en
dc.subject.keyword	Wavelet Based Pitch Extraction	en
dc.subject.keyword	Audio Watermarking	en
dc.title	Compressed Domain Processing of MPEG Audio	en
dc.type	Electronic Thesis and Dissertation	en
dc.degree.name	MSc Engg.	en
dc.degree.level	Masters	en
dc.degree.grantor	Indian Institute of Science	en
dc.degree.discipline	Faculty of Engineering	en

Files in this item

Name:: Compressed.pdf
Size:: 2.630Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [423]

Show simple item record