Nonstationary Techniques For Signal Enhancement With Applications To Speech, ECG, And NonuniformlySampled Signals
Abstract
For timevarying signals such as speech and audio, shorttime analysis becomes necessary to compute specific signal attributes and to keep track of their evolution. The standard technique is the shorttime Fourier transform (STFT), using which one decomposes a signal in terms of windowed Fourier bases. An advancement over STFT is the wavelet analysis in which a function is represented in terms of shifted and dilated versions of a localized function called the wavelet. A specific modeling approach particularly in the context of speech is based on shorttime linear prediction or shorttime Wiener filtering of noisy speech. In most nonstationary signal processing formalisms, the key idea is to analyze the properties of the signal locally, either by first truncating the signal and then performing a basis expansion (as in the case of STFT), or by choosing compactlysupported basis functions (as in the case of wavelets). We retain the same motivation as these approaches, but use polynomials to model the signal on a shorttime basis (“shorttime polynomial representation”). To emphasize the local nature of the modeling aspect, we refer to it as “local polynomial modeling (LPM).”
We pursue two main threads of research in this thesis: (i) Shorttime approaches for speech enhancement; and (ii) LPM for enhancing smooth signals, with applications to ECG, noisy nonuniformlysampled signals, and voiced/unvoiced segmentation in noisy speech.
Improved iterative Wiener filtering for speech enhancement
A constrained iterative Wiener filter solution for speech enhancement was proposed by Hansen and Clements. Sreenivas and Kirnapure improved the performance of the technique by imposing codebookbased constraints in the process of parameter estimation. The key advantage is that the optimal parameter search space is confined to the codebook. The Nonstationary signal enhancement solutions assume stationary noise. However, in practical applications, noise is not stationary and hence updating the noise statistics becomes necessary. We present a new approach to perform reliable noise estimation based on spectral subtraction. We first estimate the signal spectrum and perform signal subtraction to estimate the noise power spectral density. We further smooth the estimated noise spectrum to ensure reliability. The key contributions are: (i) Adaptation of the technique for nonstationary noises; (ii) A new initialization procedure for faster convergence and higher accuracy; (iii) Experimental determination of the optimal LPparameter space; and (iv) Objective criteria and speech recognition tests for performance comparison.
Optimal local polynomial modeling and applications
We next address the problem of fitting a piecewisepolynomial model to a smooth signal corrupted by additive noise. Since the signal is smooth, it can be represented using loworder polynomial functions provided that they are locally adapted to the signal. We choose the meansquare error as the criterion of optimality. Since the model is local, it preserves the temporal structure of the signal and can also handle nonstationary noise. We show that there is a tradeoff between the adaptability of the model to local signal variations and robustness to noise (biasvariance tradeoff), which we solve using a stochastic optimization technique known as the intersection of confidence intervals (ICI) technique. The key tradeoff parameter is the duration of the window over which the optimum LPM is computed.
Within the LPM framework, we address three problems: (i) Signal reconstruction from noisy uniform samples; (ii) Signal reconstruction from noisy nonuniform samples; and (iii) Classification of speech signals into voiced and unvoiced segments.
The generic signal model is
x(tn)=s(tn)+d(tn),0 ≤ n ≤ N  1.
In problems (i) and (iii) above, tn=nT(uniform sampling); in (ii) the samples are taken at nonuniform instants. The signal s(t)is assumed to be smooth; i.e., it should admit a local polynomial representation. The problem in (i) and (ii) is to estimate s(t)from x(tn); i.e., we are interested in optimal signal reconstruction on a continuous domain starting from uniform or nonuniform samples.
We show that, in both cases, the bias and variance take the general form:
The mean square error (MSE) is given by
where L is the length of the window over which the polynomial fitting is performed, f is a function of s(t), which typically comprises the higherorder derivatives of s(t), the order itself dependent on the order of the polynomial, and g is a function of the noise variance. It is clear that the bias and variance have complementary characteristics with respect to L. Directly optimizing for the MSE would give a value of L, which involves the functions f and g. The function g may be estimated, but f is not known since s(t)is unknown. Hence, it is not practical to compute the minimum MSE (MMSE) solution. Therefore, we obtain an approximate result by solving the biasvariance tradeoff in a probabilistic sense using the ICI technique. We also propose a new approach to optimally select the ICI technique parameters, based on a new cost function that is the sum of the probability of false alarm and the area covered over the confidence interval. In addition, we address issues related to optimal modelorder selection, search space for window lengths, accuracy of noise estimation, etc.
The next issue addressed is that of voiced/unvoiced segmentation of speech signal. Speech segments show different spectral and temporal characteristics based on whether the segment is voiced or unvoiced. Most speech processing techniques process the two segments differently. The challenge lies in making detection techniques offer robust performance in the presence of noise. We propose a new technique for voiced/unvoiced classification by taking into account the fact that voiced segments have a certain degree of regularity, and that the unvoiced segments do not possess any smoothness. In order to capture the regularity in voiced regions, we employ the LPM. The key idea is that regions where the LPM is inaccurate are more likely to be unvoiced than voiced. Within this framework, we formulate a hypothesis testing problem based on the accuracy of the LPM fit and devise a test statistic for performing V/UV classification. Since the technique is based on LPM, it is capable of adapting to nonstationary noises. We present Monte Carlo results to demonstrate the accuracy of the proposed technique.
Collections
Related items
Showing items related by title, author, creator and subject.

Music And Speech Analysis Using The 'Bach' Scale FilterBank
Ananthakrishnan, G (20090813)The aim of this thesis is to deﬁne a perceptual scale for the ‘TimeFrequency’ analysis of music signals. The equal tempered ‘Bach ’ scale is a suitable scale, since it covers most of the genres of music and the error is ... 
Characterization of the Voice Source by the DCT for Speaker Information
Abhiram, B (20171210)Extracting speakerspecific information from speech is of great interest to both researchers and developers alike, since speaker recognition technology finds application in a wide range of areas, primary among them being ... 
SpectroTemporal Features For Robust Automatic Speech Recognition
Suryanarayana, Venkata K (20110118)The speech signal is inherently characterized by its variations in time, which get reflected as variations in frequency. The specto temporal changes are due to changes in vocaltract, intonation, coarticulation and successive ...