Show simple item record

dc.contributor.advisorShevade, Shirish
dc.contributor.advisorKanade, Aditya
dc.contributor.authorMandal, Madhurima
dc.date.accessioned2022-08-25T04:38:34Z
dc.date.available2022-08-25T04:38:34Z
dc.date.submitted2022
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5834
dc.description.abstractDuring software development, developers need to ensure that the developed code is bug-free and the best coding practices are followed during the code development process. To guarantee this, the developers require answers to queries about specific aspects of the code relevant to the development. Powerful code-query languages such as CodeQL have been developed for this purpose. Use of such code-query languages, however, requires expertise in writing formal queries. For each separate query, one needs to write several lines in a code-query language. To remedy these problems, we propose to represent each query by a natural language phrase and answer such queries using neural networks. We aim to perform model training such that a single model can answer multiple queries as opposed to writing separate formal queries for each task. Such a model can answer these queries against unseen code. With this motivation, we introduce the novel NlCodeQA dataset. It includes 171,346 labeled examples where each input consists of a natural language query and a code snippet. The labels are answer spans in the input code snippet with respect to the input query. State-of-the-art BERT-style neural architectures were trained using the NlCodeQA data. Preliminary experimental results show that the proposed model achieves the exact match accuracy of 86.30%. The proposed use of natural language query and neural models for query understanding will help increase the productivity of software developers and pave the way for designing machine learning based code analysis tools that can complement the existing code analysis systems for complex code queries that are either hard or expensive to represent using a formal query language.en_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectNatural language processing, question answering, software engineeringen_US
dc.subjectNatural language processingen_US
dc.subjectquestion answeringen_US
dc.subjectsoftware engineeringen_US
dc.subjectSQLen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technologyen_US
dc.titleNeural Approaches for Natural Language Query Answering over Source Codeen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record