Automatic Speech Recognition Solutions For Resource Constrained Devices

Sasindran, Zitha

dc.contributor.advisor	Prabhakar, T V
dc.contributor.author	Sasindran, Zitha
dc.date.accessioned	2024-05-15T04:36:24Z
dc.date.available	2024-05-15T04:36:24Z
dc.date.submitted	2024
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6510
dc.description.abstract	The current realm of Automatic Speech Recognition (ASR) systems, pivotal in applications ranging from voice assistants to transcription services and assistive technologies, requires an improvement when confronted with diverse variations of user voice patterns. Addressing this challenge involves enhancing existing systems by the integration of user voice adaptation techniques. The attempts to implement these techniques directly onto edge devices presents inherent challenges, but this ensures the privacy of sensitive information in spoken utterances. Our research attempts to bridge the gap between theoretical advancements and real-world implementation by focusing on the translation of user voice adaptation techniques into fully operational systems. By navigating through the challenges posed by edge devices, our work aims to contribute to the development of robust, real-world ASR systems that can serve as a cornerstone for the evolution of improved speech recognition systems. In our first work, we propose a resource-aware framework for user voice personalization of ASR models on constrained edge devices such as mobile phones. We consider the memory and battery capabilities of the devices for making informed decisions to choose the most suitable sub-model for training in situations with limited resources. In our second work, we introduce a new Federated Learning (FL) framework designed for edge devices to collaboratively train ASR models. We elaborate on the entire methodology for deploying the model with FL functionalities and provide a thorough evaluation of the framework in a real-world setup using actual mobile phones as client devices. Following this, in our third work, we introduce a client selection algorithm for FL that optimizes waiting time by considering system resources, including computation, storage, power, and phone-specific capabilities of client devices. Our algorithm dynamically adjusts the number of training epochs for selected clients, considering available resources, thereby minimizing waiting times in the FL process. In our fourth work, we introduce a novel semi-asynchronous FL for edge devices. We calculate the time for aggregating the weights in the server with the help of our resource-aware work allocation algorithm with partial modeling approach. This strategy aids in mitigating staleness in practical scenarios within the asynchronous FL setup. Our next work concentrates on addressing ASR errors by enhancing the decoding algorithm and introducing an error correction algorithm that utilizes token-based language models and pronunciation models. The errors frequently observed in ASR output includes mistakes related to word boundary disambiguation, phonetically ambiguous words, spelling errors, and various others. Driven by the limitations of the current standard evaluation metrics in ASR tasks, we present two unique approaches aimed at developing improved evaluation metrics for ASR systems. Finally, we put forward two metrics, Heval and SeMaScore, and demonstrate their effectiveness in evaluating ASR systems, particularly when confronted with atypical speech patterns	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00519
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Automatic Speech Recognition	en_US
dc.subject	edge devices	en_US
dc.subject	on-device training	en_US
dc.subject	federated learning	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electronics	en_US
dc.title	Automatic Speech Recognition Solutions For Resource Constrained Devices	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Thesis_final_submission.pdf
Size:: 5.315Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Electronic Systems Engineering (ESE) [170]

Show simple item record