Nonstationary Techniques For Signal Enhancement With Applications To Speech, ECG, And Nonuniformly-Sampled Signals
Abstract
For time-varying signals such as speech and audio, short-time analysis becomes necessary to compute specific signal attributes and to keep track of their evolution. The standard technique is the short-time Fourier transform (STFT), using which one decomposes a signal in terms of windowed Fourier bases. An advancement over STFT is the wavelet analysis in which a function is represented in terms of shifted and dilated versions of a localized function called the wavelet. A specific modeling approach particularly in the context of speech is based on short-time linear prediction or short-time Wiener filtering of noisy speech. In most nonstationary signal processing formalisms, the key idea is to analyze the properties of the signal locally, either by first truncating the signal and then performing a basis expansion (as in the case of STFT), or by choosing compactly-supported basis functions (as in the case of wavelets). We retain the same motivation as these approaches, but use polynomials to model the signal on a short-time basis (“short-time polynomial representation”). To emphasize the local nature of the modeling aspect, we refer to it as “local polynomial modeling (LPM).”
We pursue two main threads of research in this thesis: (i) Short-time approaches for speech enhancement; and (ii) LPM for enhancing smooth signals, with applications to ECG, noisy nonuniformly-sampled signals, and voiced/unvoiced segmentation in noisy speech.
Improved iterative Wiener filtering for speech enhancement
A constrained iterative Wiener filter solution for speech enhancement was proposed by Hansen and Clements. Sreenivas and Kirnapure improved the performance of the technique by imposing codebook-based constraints in the process of parameter estimation. The key advantage is that the optimal parameter search space is confined to the codebook. The Nonstationary signal enhancement solutions assume stationary noise. However, in practical applications, noise is not stationary and hence updating the noise statistics becomes necessary. We present a new approach to perform reliable noise estimation based on spectral subtraction. We first estimate the signal spectrum and perform signal subtraction to estimate the noise power spectral density. We further smooth the estimated noise spectrum to ensure reliability. The key contributions are: (i) Adaptation of the technique for non-stationary noises; (ii) A new initialization procedure for faster convergence and higher accuracy; (iii) Experimental determination of the optimal LP-parameter space; and (iv) Objective criteria and speech recognition tests for performance comparison.
Optimal local polynomial modeling and applications
We next address the problem of fitting a piecewise-polynomial model to a smooth signal corrupted by additive noise. Since the signal is smooth, it can be represented using low-order polynomial functions provided that they are locally adapted to the signal. We choose the mean-square error as the criterion of optimality. Since the model is local, it preserves the temporal structure of the signal and can also handle nonstationary noise. We show that there is a trade-off between the adaptability of the model to local signal variations and robustness to noise (bias-variance trade-off), which we solve using a stochastic optimization technique known as the intersection of confidence intervals (ICI) technique. The key trade-off parameter is the duration of the window over which the optimum LPM is computed.
Within the LPM framework, we address three problems: (i) Signal reconstruction from noisy uniform samples; (ii) Signal reconstruction from noisy nonuniform samples; and (iii) Classification of speech signals into voiced and unvoiced segments.
The generic signal model is
x(tn)=s(tn)+d(tn),0 ≤ n ≤ N - 1.
In problems (i) and (iii) above, tn=nT(uniform sampling); in (ii) the samples are taken at nonuniform instants. The signal s(t)is assumed to be smooth; i.e., it should admit a local polynomial representation. The problem in (i) and (ii) is to estimate s(t)from x(tn); i.e., we are interested in optimal signal reconstruction on a continuous domain starting from uniform or nonuniform samples.
We show that, in both cases, the bias and variance take the general form:
The mean square error (MSE) is given by
where L is the length of the window over which the polynomial fitting is performed, f is a function of s(t), which typically comprises the higher-order derivatives of s(t), the order itself dependent on the order of the polynomial, and g is a function of the noise variance. It is clear that the bias and variance have complementary characteristics with respect to L. Directly optimizing for the MSE would give a value of L, which involves the functions f and g. The function g may be estimated, but f is not known since s(t)is unknown. Hence, it is not practical to compute the minimum MSE (MMSE) solution. Therefore, we obtain an approximate result by solving the bias-variance trade-off in a probabilistic sense using the ICI technique. We also propose a new approach to optimally select the ICI technique parameters, based on a new cost function that is the sum of the probability of false alarm and the area covered over the confidence interval. In addition, we address issues related to optimal model-order selection, search space for window lengths, accuracy of noise estimation, etc.
The next issue addressed is that of voiced/unvoiced segmentation of speech signal. Speech segments show different spectral and temporal characteristics based on whether the segment is voiced or unvoiced. Most speech processing techniques process the two segments differently. The challenge lies in making detection techniques offer robust performance in the presence of noise. We propose a new technique for voiced/unvoiced clas-sification by taking into account the fact that voiced segments have a certain degree of regularity, and that the unvoiced segments do not possess any smoothness. In order to capture the regularity in voiced regions, we employ the LPM. The key idea is that regions where the LPM is inaccurate are more likely to be unvoiced than voiced. Within this frame-work, we formulate a hypothesis testing problem based on the accuracy of the LPM fit and devise a test statistic for performing V/UV classification. Since the technique is based on LPM, it is capable of adapting to nonstationary noises. We present Monte Carlo results to demonstrate the accuracy of the proposed technique.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Music And Speech Analysis Using The 'Bach' Scale Filter-Bank
Ananthakrishnan, G (2009-08-13)The aim of this thesis is to define a perceptual scale for the ‘Time-Frequency’ analysis of music signals. The equal tempered ‘Bach ’ scale is a suitable scale, since it covers most of the genres of music and the error is ... -
Characterization of the Voice Source by the DCT for Speaker Information
Abhiram, B (2017-12-10)Extracting speaker-specific information from speech is of great interest to both researchers and developers alike, since speaker recognition technology finds application in a wide range of areas, primary among them being ... -
Timbre Perception of Time-Varying Signals
Arthi, S (2017-12-10)Every auditory event provides an information-rich signal to the brain. The signal constitutes perceptual attributes of pitch, loudness, timbre, and also, conceptual attributes like location, emotions, meaning, etc. In the ...