Non-uniform sample based speech analysis and coding
Abstract
The utility of non?uniform sample?based analysis and reconstruction of speech signals is explored. Instead of a perfect reconstruction criterion, properties of speech perception are exploited to select non?uniform samples from which perceptually good?quality speech signals are reconstructed. Optimum non?uniform sampling is also discussed, wherein the optimum non?uniform samples are selected to minimize the reconstruction error. Several interpolation functions (both local and global) for reconstruction of speech from the non?uniform samples are studied. Optimum sampling of the excitation signal is also examined for speech signal reconstruction. Separate optimum quantizers are designed for non?uniform sample location and non?uniform sample amplitude; using these optimum quantizers, variable?rate speech coders are developed for different non?uniform sample?based reconstruction schemes.
A novel non?uniform sample?based temporal feature, namely Extrema?based Signal Track Length (ESTL), is introduced, and several properties of this feature are studied. It is shown that this temporal feature is a function of both the local signal amplitude and the local signal frequency, and therefore is suitable for describing the time?varying characteristics of a non?stationary signal like speech. Using this new feature, a computationally efficient speech segmentation algorithm is proposed. The performance of this algorithm is found to be comparable to the best of the spectral?domain?feature?based segmentation techniques.
Extrema are shown to be potentially useful for enhancing stationarity in speech signals, and therefore useful for estimating the time?varying pitch of speech signals. An extrema?based unwarping scheme is proposed, which enhances the periodicity of speech in the unwarped signal domain, thereby enabling the use of the standard autocorrelation method for reliable estimation of pitch.

