Show simple item record

dc.contributor.advisorGhosh, Prasanta Kumar
dc.contributor.authorNaini, Abinay Reddy
dc.date.accessioned2021-11-18T10:19:38Z
dc.date.available2021-11-18T10:19:38Z
dc.date.submitted2020
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5516
dc.description.abstractLike neutral speech, whispered speech is one of the natural modes of speech production, and it is often used by speakers in their day-to-day life. For some people, such as laryngectomees, whispered speech is the only mode of communication. Despite the absence of voicing in whispered speech and difference in characteristics compared to the neutral speech, previous works in the literature demonstrated that whispered speech contains adequate information about the content and the speaker. In recent times, virtual assistants have become more natural and widespread. This led to an increase in the scenarios, where the device has to detect the speech and verify the speaker even if the speaker whispers. Due to the noise-like characteristics, detecting whispered speech is a challenge. On the other hand, a typical speaker verification system, where neutral speech is used for enrolling the speakers but whispered speech for testing, often performs poorly due to the difference in acoustic characteristics between the whispered and the neutral speech. Hence, the aim of this thesis is two-fold: 1) develop a robust whisper activity detector specifically for speaker verification task, 2) improve whispered speech based speaker verification performance. The contributions in this thesis lie in whisper activity detection as well as whispered speech based speaker verification. It is shown how an Attention-based average pooling in a speaker verification model can be used to detect the whispered speech regions in noisy audio more accurately than the best of the baseline schemes available. For improving speaker verification using whispered speech, we proposed features based on formant gaps, and we showed that these features are more invariant to the modes of the speech compared to the best of the existing features. We also proposed two feature mapping methods to convert the whispered features to neutral features for speaker verification. In the first method, we introduced a novel objective function, based on cosine similarity, for training a DNN, used for feature mapping. In the second method, we iteratively optimized the feature mapping model using cosine similarity based objective function and the total variability space likelihood in the i-vector based background model. The proposed optimization provided a more reliable mapping from whispered features to neutral features resulting in an improvement of speaker verification equal error rate by 44.8% (relative) over an existing DNN based feature mapping schemeen_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectSpeaker Verificationen_US
dc.subjectWhispered Speechen_US
dc.subjectVoice activity detectionen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineeringen_US
dc.titleSpeaker verification using whispered speechen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record