Sparsity Motivated Auditory Wavelet Representation and Blind Deconvolution
Abstract
In many scenarios, events such as singularities and transients that carry important information about a signal undergo spreading during acquisition or transmission and it is important to localize the events. For example, edges in an image, point sources in a microscopy or astronomical image are blurred by the pointspread function (PSF) of the acquisition system, while in a speech signal, the epochs corresponding to glottal closure instants are shaped by the vocal tract response. Such events can be extracted with the help of techniques that promote sparsity, which enables separation of the smooth components from the transient ones. In this thesis, we consider development of such sparsity promoting techniques. The contributions of the thesis are threefold: (i) an auditorymotivated continuous wavelet design and representation, which helps identify singularities; (ii) a sparsitydriven deconvolution technique; and (iii) a sparsitydriven deconvolution technique for reconstruction of niterateofinnovation (FRI) signals. We use the speech signal for illustrating the performance of the techniques in the first two parts and superresolution microscopy (2D) for the third part.
In the rst part, we develop a continuous wavelet transform (CWT) starting from an auditory motivation. Wavelet analysis provides good time and frequency localization, which has made it a popular tool for timefrequency analysis of signals. The CWT is a multiresolution analysis tool that involves decomposition of a signal using a constantQ wavelet filterbank, akin to the timefrequency analysis performed
by basilar membrane in the peripheral human auditory system. This connection motivated us to develop wavelets that possess auditory localization capabilities. Gammatone functions are extensively used in the modeling of the basilar membrane, but the nonzero average of the functions poses a hurdle. We construct bona de wavelets from the Gammatone function called Gammatone wavelets and analyze their properties such as admissibility, timebandwidth product, vanishing moments, etc..
Of particular interest is the vanishing moments property, which enables the wavelet to suppress smooth regions in a signal leading to sparsi cation. We show how this property of the Gammatone wavelets coupled with multiresolution analysis could be employed for singularity and transient detection. Using these wavelets, we also construct equivalent lterbank models and obtain cepstral feature vectors out of such a representation. We show that the Gammatone wavelet cepstral coefficients (GWCC) are effective for robust speech recognition compared with melfrequency cepstral coefficients (MFCC).
In the second part, we consider the problem of sparse blind deconvolution (SBD) starting from a signal obtained as the convolution of an unknown PSF and a sparse excitation. The BD problem is illposed and the goal is to employ sparsity to come up with an accurate solution. We formulate the SBD problem within a Bayesian framework. The estimation of lter and excitation involves optimization of a cost function that consists of an `2 data fidelity term and an `pnorm (p 2 [0; 1]) regularizer, as the sparsity promoting prior. Since the `pnorm is not differentiable at the origin, we consider a smoothed version of the `pnorm as a proxy in the optimization. Apart from the regularizer being nonconvex, the data term is also nonconvex in the filter and excitation as they are both unknown. We optimize the nonconvex cost using an alternating minimization strategy, and develop an alternating `p `2 projections algorithm (ALPA). We demonstrate convergence of the iterative algorithm and analyze in detail the role of the pseudoinverse solution as an initialization for the ALPA and provide probabilistic bounds on its accuracy considering the presence of noise and the condition number of the linear system of equations. We also consider the case of bounded noise and derive tight tail bounds using the Hoe ding inequality.
As an application, we consider the problem of blind deconvolution of speech signals. In the linear model for speech production, voiced speech is assumed to be the result of a quasiperiodic impulse train exciting a vocaltract lter. The locations of the impulses or epochs indicate the glottal closure instants and the spacing between them the pitch. Hence, the excitation in the case of voiced speech is sparse and its deconvolution from the vocaltract filter is posed as a SBD problem. We employ ALPA for SBD and show that excitation obtained is sparser than the excitations obtained using sparse linear prediction, smoothed `1=`2 sparse blind deconvolution algorithm, and majorizationminimizationbased sparse deconvolution techniques. We also consider the problem of epoch estimation and show that epochs estimated by ALPA in both clean and noisy conditions are closer to the instants indicated by the electroglottograph when with to the estimates provided by the zerofrequency ltering technique, which is the stateoftheart epoch estimation technique.
In the third part, we consider the problem of deconvolution of a specific class of continuoustime signals called niterateofinnovation (FRI) signals, which are not bandlimited, but specified by a nite number of parameters over an observation interval. The signal is assumed to be a linear combination of delayed versions of a prototypical pulse. The reconstruction problem is posed as a 2D SBD problem. The kernel is assumed to have a known form but with unknown parameters. Given the sampled version of the FRI signal, the delays quantized to the nearest point on the sampling grid are rst estimated using proximaloperatorbased alternating `p `2 algorithm (ALPAprox), and then superresolved to obtain o grid (O. G.) estimates using gradientdescent optimization. The overall technique is termed OGALPAprox.
We show application of OGALPAprox to a particular modality of superresolution microscopy (SRM), called stochastic optical reconstruction microscopy (STORM).
The resolution of the traditional optical microscope is limited by di raction and is termed as Abbe's limit. The goal of SRM is to engineer the optical imaging system to resolve structures in specimens, such as proteins, whose dimensions are smaller than the di raction limit. The specimen to be imaged is tagged or labeled with lightemitting or uorescent chemical compounds called uorophores. These compounds speci cally bind to proteins and exhibit uorescence upon excitation. The uorophores are assumed to be point sources and the light emitted by them undergo spreading due to di raction. STORM employs a sequential approach, wherein each step only a few uorophores are randomly excited and the image is captured by a sensor array. The obtained image is di ractionlimited, however, the separation between the uorophores allows for localizing the point sources with high precision. The localization is performed using Gaussian peak tting. This process of random excitation coupled with localization is performed sequentially and subsequently consolidated to obtain a highresolution image. We pose the localization as a SBD problem and employ OGALPAprox to estimate the locations. We also report comparisons with the de facto standard Gaussian peak tting algorithm and show that the statistical performance is superior. Experimental results on real data show that the reconstruction quality is on par with the Gaussian peak tting.
Collections
Related items
Showing items related by title, author, creator and subject.

Wavelet Based Denoising Techniques For Improved DOA Estimation And Source Localisation
Sathish, R (20110516) 
Wavelet Based Algorithms For Spike Detection In Micro Electrode Array Recordings
Nabar, Nisseem S (20100713)In this work, the problem of detecting neuronal spikes or action potentials (AP) in noisy recordings from a Microelectrode Array (MEA) is investigated. In particular, the spike detection algorithms should be less complex ...