• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Electrical Engineering (EE)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Speech Based Low-Complexity Classification of Patients with Amyotrophic Lateral Sclerosis from Healthy Controls: Exploring the Role of Hypernasality

    View/Open
    Thesis full text (2.956Mb)
    Author
    Jayakumar, Anjali
    Metadata
    Show full item record
    Abstract
    Amyotrophic Lateral Sclerosis (ALS) is a neurodegenerative disorder characterized by motor neuron degeneration, leading to muscle weakness, atrophy, and speech impairments. Dysarthria, an early symptom in approximately 30% of ALS patients, often presents with hypernasality due to velopharyngeal dysfunction, observed in around 73.88% of individuals with bulbar-onset ALS. These speech impairments significantly impact communication and quality of life. Current ALS monitoring methods, such as clinical assessments, genetic testing, electromyography (EMG), and magnetic resonance imaging (MRI), are time-consuming and invasive. In contrast, speech-based approaches provide a non-invasive and efficient alternative. However, the lack of large ALS-specific speech datasets hinders model development. This study aims to develop a simplified, low-complexity model to distinguish ALS speech from healthy control (HC) speech, using hypernasality as a key indicator, and avoiding the need for large ALS-specific datasets. The study begins by investigating hypernasality in ALS speech across varying dysarthria severity, using HuBERT (Hidden Unit BERT) representations and Mel-frequency cepstral coefficients (MFCC) features. Next, the research focuses on simplifying deep learning models using the traditional method of training on ALS and HC dataset, transitioning from complex Convolutional Neural Networks (CNNs) and Bidirectional Long Short-Term Memory (BiLSTM) models to simpler Deep Neural Networks (DNNs) and Support Vector Machines (SVMs). These models are trained using HuBERT representations, and MFCC and their derivatives (deltas and double-deltas) as the feature, with various temporal statistics explored. The individual components and coefficients of the MFCC and its derivatives are also analyzed separately to reduce feature dimensionality and compuitational cost. The study also integrates hypernasality into the ALS vs. HC classification by training a model for nasal vs. non-nasal phoneme classification using healthy speech data. The model classifies ALS as the nasal class and HC as the non-nasal class, demonstrating effectiveness in distinguishing ALS speech from HC speech. Finally, the study analyzes classification accuracies with and without using nasality, considering varying sizes of ALS dataset. It explores the potential of nasality to provide reliable classification results, particularly in cases where ALS data is limited. The results show that nasality increases with disease severity, as observed through both experimental results and perceptual analysis. Using the traditional method and HuBERT representation as the feature, the CNN-BiLSTM model achieves an average accuracy of 85.18% for Spontaneous Speech (SPON) task and 85.21% for Diadochokinetic Rate (DIDK) task, while the SVM model shows a decrease in accuracy by 7.87% for SPON and 6.50% for DIDK, but the SVM requires significantly fewer resources, with only 769 parameters and 1,536 floating-point operations (FLOPs), compared to CNN-BiLSTM’s 1,761,032 parameters and 2,840,000 FLOPs. MFCC features achieve similar accuracy to HuBERT, with an average accuracy of 77.24% for SPON and 77.21% for DIDK using 37 parameters and 72 FLOPs using SVM, compared to HuBERT’s 769 parameters and 1,536 FLOPs. Dimensionality reduction of MFCC minimizes complexity, with the individual delta and double delta coefficients giving highest accuracy for the SVM model of 78.24% for SPON and 78.16% for DIDK, using only 2 parameters and 2 FLOPs. On using nasality as an indicator, for the SPON task, the CNN-BiLSTM model achieves a maximum accuracy of 68.63%, while the SVM model achieves 70.15% accuracy with much lower complexity (769 parameters and 1,536 FLOPs compared to 1,761,032 parameters and 2,840,000 FLOPs for CNN-BiLSTM). Similarly, for the DIDK task, the CNN-BiLSTM reaches 80.74% accuracy, while DNN models and SVM provide comparable accuracy with significantly reduced computational cost. The nasality-based method maintains relatively stable accuracy across different dataset sizes, outperforming the traditional method by 2-6% for SPON and 2-10% for DIDK when using only 10% of the dataset, and achieving up to a 3% improvement for DIDK with 40% of the data.
    URI
    https://etd.iisc.ac.in/handle/2005/7696
    Collections
    • Electrical Engineering (EE) [408]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV