Show simple item record

dc.contributor.advisorGhosh, Prasanta Kumar
dc.contributor.authorJayakumar, Anjali
dc.date.accessioned2025-12-11T05:22:27Z
dc.date.available2025-12-11T05:22:27Z
dc.date.submitted2025
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/7696
dc.description.abstractAmyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disorder characterized by motor neuron degeneration, leading to muscle weakness, atrophy, and speech impairments. Dysarthria, an early symptom in approximately 30% of ALS patients, often presents with hypernasality due to velopharyngeal dysfunction, observed in around 73.88% of individuals with bulbar-onset ALS. These speech impairments significantly impact communication and quality of life. Current ALS monitoring methods, such as clinical assessments, genetic testing, electromyography (EMG), and magnetic resonance imaging (MRI), are time-consuming and invasive. In contrast, speech-based approaches provide a non-invasive and efficient alternative. However, the lack of large ALS-specific speech datasets hinders model development. This study aims to develop a simplified, low-complexity model to distinguish ALS speech from healthy control (HC) speech, using hypernasality as a key indicator, and avoiding the need for large ALS-specific datasets. The study begins by investigating hypernasality in ALS speech across varying dysarthria severity, using HuBERT (Hidden Unit BERT) representations and Mel-frequency cepstral coefficients (MFCC) features. Next, the research focuses on simplifying deep learning models using the traditional method of training on ALS and HC dataset, transitioning from complex Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) models to simpler Deep Neural Networks (DNNs) and Support Vector Machines (SVMs). These models are trained using HuBERT representations, and MFCC and their derivatives (deltas and double-deltas) as the feature, with various temporal statistics explored. The individual components and coefficients of the MFCC and its derivatives are also analyzed separately to reduce feature dimensionality and compuitational cost. The study also integrates hypernasality into the ALS vs. HC classification by training a model for nasal vs. non-nasal phoneme classification using healthy speech data. The model classifies ALS as the nasal class and HC as the non-nasal class, demonstrating effectiveness in distinguishing ALS speech from HC speech. Finally, the study analyzes classification accuracies with and without using nasality, considering varying sizes of ALS dataset. It explores the potential of nasality to provide reliable classification results, particularly in cases where ALS data is limited. The results show that nasality increases with disease severity, as observed through both experimental results and perceptual analysis. Using the traditional method and HuBERT representation as the feature, the CNN-BiLSTM model achieves an average accuracy of 85.18% for Spontaneous Speech (SPON) task and 85.21% for Diadochokinetic Rate (DIDK) task, while the SVM model shows a decrease in accuracy by 7.87% for SPON and 6.50% for DIDK, but the SVM requires significantly fewer resources, with only 769 parameters and 1,536 floating-point operations (FLOPs), compared to CNN-BiLSTM’s 1,761,032 parameters and 2,840,000 FLOPs. MFCC features achieve similar accuracy to HuBERT, with an average accuracy of 77.24% for SPON and 77.21% for DIDK using 37 parameters and 72 FLOPs using SVM, compared to HuBERT’s 769 parameters and 1,536 FLOPs. Dimensionality reduction of MFCC minimizes complexity, with the individual delta and double delta coefficients giving highest accuracy for the SVM model of 78.24% for SPON and 78.16% for DIDK, using only 2 parameters and 2 FLOPs. On using nasality as an indicator, for the SPON task, the CNN-BiLSTM model achieves a maximum accuracy of 68.63%, while the SVM model achieves 70.15% accuracy with much lower complexity (769 parameters and 1,536 FLOPs compared to 1,761,032 parameters and 2,840,000 FLOPs for CNN-BiLSTM). Similarly, for the DIDK task, the CNN-BiLSTM reaches 80.74% accuracy, while DNN models and SVM provide comparable accuracy with significantly reduced computational cost. The nasality-based method maintains relatively stable accuracy across different dataset sizes, outperforming the traditional method by 2-6% for SPON and 2-10% for DIDK when using only 10% of the dataset, and achieving up to a 3% improvement for DIDK with 40% of the data.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET01172
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectspeech signal Processingen_US
dc.subjectMachine Learningen_US
dc.subjectAmyotrophic lateral Sclerosisen_US
dc.subjectHypernasalityen_US
dc.subjectDysarthriaen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Electrical engineering, electronics and photonics::Electrical engineeringen_US
dc.titleSpeech Based Low-Complexity Classification of Patients with Amyotrophic Lateral Sclerosis from Healthy Controls: Exploring the Role of Hypernasalityen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record