• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Dereverberation of Speech Using Autoregressive Models of Sub-band Envelopes

    View/Open
    Thesis full text (6.078Mb)
    Author
    Purushothaman, Anurenjan
    Metadata
    Show full item record
    Abstract
    Automatic speech recognition (ASR) based technologies are radically changing the way we interact with digital services and information. Most of these application leverage on hands-free speech, where talkers are able to speak at a distance from the microphones without the nuance of handheld or body-worn device. The applications like, meeting annotations, speech to text transcription in teleconferencing, hands-free interfaces for controlling consumer-products, like interactive TV, virtual assistants in mobile phones, smart speakers etc, will benefit from distant talking mode of operation. The main issues in distant talking speech recognition is the corruption of speech signals by noise and the reverberation. This thesis is focused on developing dereverberation methods for speech processing using sub-band temporal envelopes. This thesis pursues two broad directions for addressing issues in far-field ASR. In the first part of the thesis, two methods for dereverberation are proposed. In the second part of the thesis, we develop a speech enhancement model, where the audio signal is re-synthesized using dereverberated temporal envelopes and corresponding carrier components. In the first part of the thesis, two methods to address reverberation is developed. The first method deals with developing a 3-D Acoustic modeling framework for far-field ASR (Automatic Speech Recognition), where spatio-spectral features from all the available channels are extracted. The features that are input to the 3-D CNN are extracted by modeling the signal peaks in the spatio-spectral domain using a multi-variate autoregressive (AR) modeling approach. This AR model is efficient in capturing the channel correlations in the frequency domain of the multi-channel signal. In the second method, a neural model for speech dereverberation using the long-term sub-band envelopes of speech is developed. The neural dereverberation model estimates the envelope gain, which when applied to reverberant signals, suppresses the late reflection components in the far-field signal. The dereverberated envelopes are used for feature extraction in speech recognition. The second part of the thesis deals with envelope-carrier based speech enhancement. Here, we investigate the effect of far-field artifacts on temporal envelopes and the corresponding carrier components. A dual path recurrent neural model is used to parallelly learn the mapping for the reverberant envelopes and the carrier signals. Further, joint learning of the speech enhancement model with the end-to-end ASR model a single neural model is proposed. Both parts of the thesis use the frequency domain linear prediction (FDLP) based model for extracting the envelopes of the sub-band signals in long analysis windows. We show several ASR and speech quality experiments to highlight the benefits of the proposed techniques.
    URI
    https://etd.iisc.ac.in/handle/2005/6236
    Collections
    • Electrical Engineering (EE) [357]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV