Show simple item record

dc.contributor.advisorSreenivas, T V
dc.contributor.advisorHari, K V S
dc.contributor.authorChetupalli, Srikanth Raj
dc.date.accessioned2020-10-21T06:02:16Z
dc.date.available2020-10-21T06:02:16Z
dc.date.submitted2020
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/4632
dc.description.abstractSpeech signal includes the spoken message and a lot more information such as speaker emotion, identity, language, speaking location characteristic etc., which makes the human interaction lively, desirable and more useful. In the present day speech tele-communication, source related attributes are nearly preserved but the spatial attributes of the speech source such as the position, room reverberation, and ambient noise are meant to be ignored. Spatial perception of a person's speech in a reverberant environment comprises three components: direction, range and ``spaciousness'' as perceived by a local listener. A linear system based reverberant signal model has two components, directional and diffuse, which are responsible for the perception of source direction and spaciousness. The range perception of a source is commonly attributed to the relative levels of directional, and diffuse components. An analysis of the signal recorded by the multiple microphones, in terms of the source position, direction and diffuse components would enable a perceptually sensitive spatial speech reconstruction. In this thesis, we consider communication of speech along with its spatial attributes in an enclosure and its reconstruction at a receiver location to achieve a spatial speech communication solution. In the present approach, spatial speech communication involves estimation of the directional and diffuse components of a recorded source signal and the relative source position at the transmitter enclosure. We consider arbitrarily placed multi-microphone recording for spatial signal acquisition. We then extend a delayed \highlight{multi-channel} linear prediction (MCLP) based formulation to estimate the directional and diffuse signal components. In MCLP, the diffuse signal component is modeled using linear prediction in the short-time Fourier transform (STFT) domain, and the prediction residual is taken as the directional component. We develop three different methods for the estimation of prediction filter. In the first method, we consider the heavy-tail distribution nature of the directional signal STFT and propose a Student's t-distribution based Bayesian estimation. The model also includes an independent Gaussian prior for the prediction coefficients to account for the unknown prediction order of the MCLP. In the second formulation, the knowledge of clean speech signal is used explicitly as a constraint for the filter estimation, through an auto encoder neural network constrained power spectral density (PSD) estimation. It is shown to be more effective for the iterative MCLP optimum solution. In the third method, we consider a directional sound propagation model using the relative acoustic transfer function (RTF), and a distortion-less response spatial filter estimation. This method combines the benefits of MCLP dereverberation as well as spatial filtering, which makes it better suited for cases with noise or other directional interferers. Further, the source direction is also estimated as a latent parameter in this method. We then extend the formulation to a dynamic moving source scenario catering to source position changes. Using the linear dynamical system approach and the online spatial filtering, we develop a scheme to obtain both dereverberation and source tracking. We recognize that the directional and diffuse components of the reverberant source signal also contain cues about the source and microphone positions. Thus, we develop a method to compute the geometry of the multi-microphone placement using the diffuse component of the microphone signals, and then the directional component is used for source position estimation with respect to the microphones. The reverberant signal analysis and source position estimation is shown to be useful for spatial speech communication using the multiple spatially distributed microphone arrays, for signal acquisition at the transmitter and a multi-loudspeaker reconstruction at the receiver. We estimate the reverberant signal components and the source direction separately at each microphone array of the transmitter and the source position is estimated via fusion of individual direction estimates. The direct and diffuse components of the reverberant signal of the microphone array nearest to the estimated source position are considered for reconstruction at the receiver. A perceptually effective and simple reconstruction is considered at the receiver, in which the directional component of the reverberant signal is assumed to be coming from a point source and the reverberation component is diffuse all around the listener. We simulate the virtual transmitter source location using a four loudspeaker setup at the receiver: the vector base amplitude panning scheme is considered for the directional component and the diffuse component is used to recreate the spaciousness. The spaciousness is achieved through diffuse sound playback equally from all the loudspeakers after decorrelation and gain normalization. We show that the formulation is amenable to spatial scene modification, such as source direction and distance modification using the decomposed signal components and the source parameters.en_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectSpatial Audioen_US
dc.subjectMicrophone arraysen_US
dc.subjectspeech dereverberationen_US
dc.subjectautoencodersen_US
dc.subjectmulti-channel linear predictionen_US
dc.subjectBayesian estimationen_US
dc.subjectAcoustic source localizationen_US
dc.subjectSpeech Processingen_US
dc.subjectSpatial Audioen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electronicsen_US
dc.titleSpatial Analysis and Reconstruction of Reverberant Speechen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record