Speaker verification using whispered speech

Naini, Abinay Reddy

dc.contributor.advisor	Ghosh, Prasanta Kumar
dc.contributor.author	Naini, Abinay Reddy
dc.date.accessioned	2021-11-18T10:19:38Z
dc.date.available	2021-11-18T10:19:38Z
dc.date.submitted	2020
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/5516
dc.description.abstract	Like neutral speech, whispered speech is one of the natural modes of speech production, and it is often used by speakers in their day-to-day life. For some people, such as laryngectomees, whispered speech is the only mode of communication. Despite the absence of voicing in whispered speech and difference in characteristics compared to the neutral speech, previous works in the literature demonstrated that whispered speech contains adequate information about the content and the speaker. In recent times, virtual assistants have become more natural and widespread. This led to an increase in the scenarios, where the device has to detect the speech and verify the speaker even if the speaker whispers. Due to the noise-like characteristics, detecting whispered speech is a challenge. On the other hand, a typical speaker verification system, where neutral speech is used for enrolling the speakers but whispered speech for testing, often performs poorly due to the difference in acoustic characteristics between the whispered and the neutral speech. Hence, the aim of this thesis is two-fold: 1) develop a robust whisper activity detector specifically for speaker verification task, 2) improve whispered speech based speaker verification performance. The contributions in this thesis lie in whisper activity detection as well as whispered speech based speaker verification. It is shown how an Attention-based average pooling in a speaker verification model can be used to detect the whispered speech regions in noisy audio more accurately than the best of the baseline schemes available. For improving speaker verification using whispered speech, we proposed features based on formant gaps, and we showed that these features are more invariant to the modes of the speech compared to the best of the existing features. We also proposed two feature mapping methods to convert the whispered features to neutral features for speaker verification. In the first method, we introduced a novel objective function, based on cosine similarity, for training a DNN, used for feature mapping. In the second method, we iteratively optimized the feature mapping model using cosine similarity based objective function and the total variability space likelihood in the i-vector based background model. The proposed optimization provided a more reliable mapping from whispered features to neutral features resulting in an improvement of speaker verification equal error rate by 44.8% (relative) over an existing DNN based feature mapping scheme	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Speaker Verification	en_US
dc.subject	Whispered Speech	en_US
dc.subject	Voice activity detection	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineering	en_US
dc.title	Speaker verification using whispered speech	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: abinay-nainy-revised-thesis.pdf
Size:: 2.358Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [421]

Show simple item record