Show simple item record

dc.contributor.advisorTalukdar, Partha Pratim
dc.contributor.authorKumar, Sawan
dc.date.accessioned2023-03-17T05:08:08Z
dc.date.available2023-03-17T05:08:08Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6044
dc.description.abstractTraditional strategies to build machine learning based classification systems employ discrete labels as targets. This limits the usefulness of such systems in two ways. First, the generalizability of these systems is limited to labels present and well represented in the training data. Second, with increasingly larger neural network models gaining acceptability, supervision with discrete labels alone does not lead to a straightforward interface for generating explanations for the decisions taken by such systems. Natural Language (NL) Supervision (NLS), in the form of task descriptions, examples, label descriptions and explanations for labelling decisions, provides a way to overcome these bottlenecks. Working in this paradigm, we propose novel methods for improving data-efficiency and trustworthiness: (1) [Data Efficiency using NLS] Word Sense Disambiguation (WSD) using Sense Definition Embeddings: WSD, a long-standing open problem in Natural Language Processing (NLP), typically presents itself with small training corpora with long tails of label distributions. Existing supervised methods didn’t generalize well to rare or unseen classes while NL supervision based systems did worse on overall (standard) evaluation benchmarks. We propose Extended WSD Incorporating Sense Embeddings (EWISE), a supervised model to perform WSD by predicting over a continuous sense embedding space as opposed to a discrete label space. This allows EWISE to generalize over both seen and unseen senses, thus achieving generalized zero-shot learning. To obtain target sense embeddings, EWISE utilizes NL sense definitions along with external knowledge in WordNet relations. EWISE achieved new state-of-the-art WSD performance at the time of publication, specifically by improving on zero-shot and few-shot learning. (2) [Trustworthiness using NLS] Natural Language Inference (NLI) with Faithful NL Explanations: Generated NL explanations are expected to be faithful, i.e., they should correlate well with the model’s internal decision making. In this work, we focus on the task of NLI and address the following question: can we build NLI systems which produce labels with high accuracy, while also generating faithful explanations of its decisions? We propose Natural-language Inference over Label-specific Explanations (NILE), a novel NLI method which utilizes auto-generated label-specific NL explanations to produce a label along with its faithful explanation. Our evaluation of NILE also supports the claim that accurate systems capable of providing testable explanations of their decisions can be designed. (3) [Improving the NLS interface of Large Language Models (LLM)] LLMs, pre-trained on unsupervised corpora, have proven to be successful as zero-shot and few-shot learners on downstream tasks using only a textual interface. This enables a promising NLS interface. A typical usage involves augmenting an input example along with some priming text comprising of task descriptions and training examples and processing the output probabilities to make predictions. In this work, we further explore priming-based few-shot learning and make the following contributions: (a) Reordering Examples Helps during Priming-based Few-Shot Learning: We show that presenting training examples in the right order is key for generalization. We introduce PERO (Prompting with Examples in the Right Order), where we formulate few-shot learning as search over the set of permutations of the training examples. We demonstrate the effectiveness of the proposed method on the tasks of sentiment classification, natural language inference and fact retrieval. We show that PERO can learn to generalize efficiently using as few as 10 examples, in contrast to existing approaches. (b) Answer-level Calibration (ALC) Helps Free-form Multiple Choice Question Answering (QA): We consider the QA format, where we need to choose from a set of free-form textual choices of unspecified lengths, given a context. We present ALC, where our main suggestion is to model context-independent biases in terms of the probability of a choice without the associated context and to subsequently remove these biases using an unsupervised estimate of similarity with the full context. ALC improves zero-shot and few-shot performance on several benchmarks while also providing a more reliable estimate of performance.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00059
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectArtificial Intelligenceen_US
dc.subjectMachine Learningen_US
dc.subjectNatural Language Processingen_US
dc.subjectWord Sense Disambiguationen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer scienceen_US
dc.titleMethods for Improving Data-efficiency and Trustworthiness using Natural Language Supervisionen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record