Show simple item record

dc.contributor.advisorPrabhakar, T V
dc.contributor.authorSasindran, Zitha
dc.date.accessioned2024-05-15T04:36:24Z
dc.date.available2024-05-15T04:36:24Z
dc.date.submitted2024
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6510
dc.description.abstractThe current realm of Automatic Speech Recognition (ASR) systems, pivotal in applications ranging from voice assistants to transcription services and assistive technologies, requires an improvement when confronted with diverse variations of user voice patterns. Addressing this challenge involves enhancing existing systems by the integration of user voice adaptation techniques. The attempts to implement these techniques directly onto edge devices presents inherent challenges, but this ensures the privacy of sensitive information in spoken utterances. Our research attempts to bridge the gap between theoretical advancements and real-world implementation by focusing on the translation of user voice adaptation techniques into fully operational systems. By navigating through the challenges posed by edge devices, our work aims to contribute to the development of robust, real-world ASR systems that can serve as a cornerstone for the evolution of improved speech recognition systems. In our first work, we propose a resource-aware framework for user voice personalization of ASR models on constrained edge devices such as mobile phones. We consider the memory and battery capabilities of the devices for making informed decisions to choose the most suitable sub-model for training in situations with limited resources. In our second work, we introduce a new Federated Learning (FL) framework designed for edge devices to collaboratively train ASR models. We elaborate on the entire methodology for deploying the model with FL functionalities and provide a thorough evaluation of the framework in a real-world setup using actual mobile phones as client devices. Following this, in our third work, we introduce a client selection algorithm for FL that optimizes waiting time by considering system resources, including computation, storage, power, and phone-specific capabilities of client devices. Our algorithm dynamically adjusts the number of training epochs for selected clients, considering available resources, thereby minimizing waiting times in the FL process. In our fourth work, we introduce a novel semi-asynchronous FL for edge devices. We calculate the time for aggregating the weights in the server with the help of our resource-aware work allocation algorithm with partial modeling approach. This strategy aids in mitigating staleness in practical scenarios within the asynchronous FL setup. Our next work concentrates on addressing ASR errors by enhancing the decoding algorithm and introducing an error correction algorithm that utilizes token-based language models and pronunciation models. The errors frequently observed in ASR output includes mistakes related to word boundary disambiguation, phonetically ambiguous words, spelling errors, and various others. Driven by the limitations of the current standard evaluation metrics in ASR tasks, we present two unique approaches aimed at developing improved evaluation metrics for ASR systems. Finally, we put forward two metrics, Heval and SeMaScore, and demonstrate their effectiveness in evaluating ASR systems, particularly when confronted with atypical speech patternsen_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00519
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectAutomatic Speech Recognitionen_US
dc.subjectedge devicesen_US
dc.subjecton-device trainingen_US
dc.subjectfederated learningen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electronicsen_US
dc.titleAutomatic Speech Recognition Solutions For Resource Constrained Devicesen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record