Show simple item record

dc.contributor.advisorShevade, Shirish
dc.contributor.advisorKanade, Aditya
dc.contributor.authorSahu, Surya Prakash
dc.date.accessioned2023-06-26T07:01:50Z
dc.date.available2023-06-26T07:01:50Z
dc.date.submitted2023
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/6139
dc.description.abstractSoftware developers often make queries about the security, performance effectiveness, and maintainability of their code. Through an iterative debugging process, developers analyze the code to find answers to these queries. The process can be seen as a question-answering task that requires developers to identify code spans satisfying certain properties. Many of these queries can be answered by existing code analysis tools such as CodeQL. However, using such tools requires design, implementation, and verification efforts. In this work, we propose an alternative to the code analysis tools by formulating the task of query answering over source code as a span prediction problem. In the proposed approach, a neural model is designed to predict appropriate answer spans in a code in response to a query. The required supporting-facts to justify the predicted answers are also identified by the model. Pre-trained language models for code are fine-tuned on a newly prepared challenging dataset, CodeQueries, for query answering over source code. We demonstrate that the proposed approach performs well on the query answering over source code task when only relevant code blocks are provided as input to the model. Experiments conducted on the dataset demonstrate that the proposed neural approach is robust to noisy span labeling and can even handle code with minor syntax errors. Although large-sized code and limited training examples adversely affect the model performance, we suggest methods to address these issues. Based on our study, we believe that the proposed neural approach will be an additional tool in a developer's toolbox for query answering over source code.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET00152
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectNatural Language Processingen_US
dc.subjectExtractive Question-Answeringen_US
dc.subjectCode Understandingen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer science::Software engineeringen_US
dc.titleCodeQueries: Benchmarking Query Answering over Source Codeen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record