Mask Estimator Approaches For Audio Beamforming

Kumar, Rohit

dc.contributor.advisor	Ganapathy, Sriram
dc.contributor.author	Kumar, Rohit
dc.date.accessioned	2020-12-03T10:54:53Z
dc.date.available	2020-12-03T10:54:53Z
dc.date.submitted	2020
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4711
dc.description.abstract	Beamforming is a family of algorithms and performs a spatial filtering operation that makes it possible to map the distribution of the sources at a certain distance from the microphones and therefore locate the strongest source. The state-of-art methods for acoustic beamforming in multi-channel ASR are based on a neural mask estimator that predicts the presence of speech and noise, which in turn used to determine spatial filter coefficients value. These models are trained using a paired corpus of clean and noisy recordings (teacher model). In this thesis, we attempt to move away from the requirements of having supervised clean recordings for training the mask estimator. The models based on signal enhancement and beamforming using multi-channel linear prediction serve as the required mask estimate. In this way, the model training can also be carried out on real recordings of noisy speech rather than simulated ones alone done in a typical teacher model. We propose two model in this thesis, both based on Unsupervised Mask estimation, and several experiments performed on noisy and reverberant environments in the CHiME-3 corpus as well as the REVERB challenge corpus highlight the effectiveness of the proposed approaches. Both the method that we discuss are novel method, where the first model only deals with the real data, the second model deals with complex data i,e complex short time Fourier transform features to obtain the mask estimate.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Generalized Eigen Value Beamforming	en_US
dc.subject	Neural Mask Estimation	en_US
dc.subject	Unsupervised Learning	en_US
dc.subject	Multi-channel ASR	en_US
dc.subject	Complex Transformer	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineering	en_US
dc.title	Mask Estimator Approaches For Audio Beamforming	en_US
dc.type	Thesis	en_US
dc.degree.name	MS	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Mtech_Thesis.pdf
Size:: 1.533Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Communication Engineering (ECE) [405]

Show simple item record