• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Fragile Interpretations and Interpretable models in NLP

    View/Open
    Thesis full text (2.047Mb)
    Author
    Kohli, Ayushi
    Metadata
    Show full item record
    Abstract
    Deploying deep learning models in critical areas where the cost of making a wrong decision leads to a substantial financial loss, like in the banking domain, or even loss of life, like in the medical field, is significantly less. We cannot entirely rely on deep learning models as they act as black boxes for us. This problem can be resolved by Explainable AI, which aims to explain these black boxes. There are two approaches to explaining these black boxes, one being via posthoc explainability techniques and the other being by designing inherently interpretable models. These two are the basis of our work. In the first part, we talk about the instability of posthoc explanations, leading to fragile in- terpretations. This work focuses on the robustness of NLP models along with the robustness of interpretations. We have proposed an algorithm that perturbs the input text such that the generated text is semantically, conceptually, and grammatically similar to the input text, yet the interpretations produced are fragile. Through our experiments, we have shown how the interpretations of two very similar sentences vary significantly. We have shown that posthoc explanations can be unstable, inconsistent, unfaithful, and fragile; and, therefore, cannot be trusted. Finally, we have concluded whether to trust the robust NLP models or the posthoc explanations. In the second part, we have designed two inherently interpretable models, one for offensive lan- guage detection tasks in the case of multi-task learning for three subtasks, sharing a hierarchical relationship between them and the other for the question pair similarity task. Our offensive language detection model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. Our question pair similarity model achieved an F1 score of 0.83. We also provide a detailed analysis of the model interpretability as well as prediction interpretability.
    URI
    https://etd.iisc.ac.in/handle/2005/6244
    Collections
    • Computer Science and Automation (CSA) [394]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV