Show simple item record

dc.contributor.advisorSusheela Devi, V
dc.contributor.authorKohli, Ayushi
dc.date.accessioned2023-10-12T07:03:21Z
dc.date.available2023-10-12T07:03:21Z
dc.date.submitted2023
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6244
dc.description.abstractDeploying deep learning models in critical areas where the cost of making a wrong decision leads to a substantial financial loss, like in the banking domain, or even loss of life, like in the medical field, is significantly less. We cannot entirely rely on deep learning models as they act as black boxes for us. This problem can be resolved by Explainable AI, which aims to explain these black boxes. There are two approaches to explaining these black boxes, one being via posthoc explainability techniques and the other being by designing inherently interpretable models. These two are the basis of our work. In the first part, we talk about the instability of posthoc explanations, leading to fragile in- terpretations. This work focuses on the robustness of NLP models along with the robustness of interpretations. We have proposed an algorithm that perturbs the input text such that the generated text is semantically, conceptually, and grammatically similar to the input text, yet the interpretations produced are fragile. Through our experiments, we have shown how the interpretations of two very similar sentences vary significantly. We have shown that posthoc explanations can be unstable, inconsistent, unfaithful, and fragile; and, therefore, cannot be trusted. Finally, we have concluded whether to trust the robust NLP models or the posthoc explanations. In the second part, we have designed two inherently interpretable models, one for offensive lan- guage detection tasks in the case of multi-task learning for three subtasks, sharing a hierarchical relationship between them and the other for the question pair similarity task. Our offensive language detection model achieved an F1 score of 0.78 on the OLID dataset and 0.85 on the SOLID dataset. Our question pair similarity model achieved an F1 score of 0.83. We also provide a detailed analysis of the model interpretability as well as prediction interpretability.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00257
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectdeep learning modelsen_US
dc.subjectExplainable AIen_US
dc.subjectFragile interpretationsen_US
dc.subjectNLPen_US
dc.subjectArtificial Intelligenceen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleFragile Interpretations and Interpretable models in NLPen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record