Deep Learning for Bug Localization and Program Repair

Gupta, Rahul

dc.contributor.advisor	Kanade, Aditya
dc.contributor.author	Gupta, Rahul
dc.date.accessioned	2020-03-05T05:24:18Z
dc.date.available	2020-03-05T05:24:18Z
dc.date.submitted	2020
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4350
dc.description.abstract	In this thesis, we focus on the problem of program debugging and present novel deep learning based techniques for bug-localization and program repair. Deep learning techniques have been successfully applied to a variety of tasks in natural language processing over the years. Although natural language text and programs are similar to some extent, the latter have procedural interpretation and richer structure. Applying deep learning techniques to programs presents many novel challenges, which arise due to these differences. We address some of these challenges in this thesis. Most of the existing program debugging research is dominated by formal, theory-first approaches. These approaches fail to take advantage of the existing codebases available online in the form of open source software repositories and student assignment submissions to massive open online courses on programming. Recognizing this, researchers have begun to replace expert-designed heuristics with models learned from codebases to improve the performance of the conventional debugging techniques. This thesis shows that it is possible to solve program debugging problems directly from raw programs using deep learning techniques in an end-to-end manner. More specifically, we present three approaches for bug-localization and program repair that are entirely data-driven and learn to perform their task instead of following the steps specified by a domain expert. We first introduce the notion of common programming errors and present a deep neural network based end-to-end technique, called DeepFix, that can fix multiple such errors in a program without relying on any external tool to locate or fix them. At the heart of DeepFix is a multi-layered sequence-to-sequence neural network equipped with an attention mechanism, comprising an encoder recurrent neural network (RNN) to process the input and a decoder RNN that generates the output by attending to the encoded input. The network is trained on a labeled dataset to predict a faulty program location along with the corrected program statement. Multiple errors in a program can be fixed by invoking DeepFix iteratively. Our experiments demonstrate that DeepFix is effective and fixes thousands of programs. While repositories containing erroneous programs are easily available, the labels of these programs (faulty program location and correct statement) required by DeepFix are not easily available. Labeling a large number of erroneous programs is a daunting task. To address this issue, we propose a novel deep reinforcement learning based technique, called RLAssist, that does not require labeled training data and still matches the performance of DeepFix. At the core of RLAssist is a novel programming language correction framework amenable to reinforcement learning. The framework allows an agent to mimic human actions for text navigation and editing. We demonstrate that the agent can be trained through self-exploration directly from the raw input, that is, the program text itself, without any prior knowledge of the formal syntax of the programming language. Reinforcement learning techniques are, however, usually slow to train. We also show that RLAssist can be trained much faster by leveraging expert demonstrations for as little as one-tenth of its training data, which also helps it in achieving better performance than DeepFix. Finally, we present a deep learning based technique for semantic bug-localization in programs with respect to failing tests. The proposed technique works in two phases. In the first phase, a novel tree convolutional neural network is used to predict whether a program passes or fails a given test. In the second phase, we query a state-of-the-art neural prediction attribution technique to find out which lines of the programs make the network predict the failures to localize the bugs. This is a static bug-localization technique which requires neither program instrumentation nor multiple executions, as is necessary for existing dynamic bug-localization techniques. Our experiments show that the proposed technique is competitive with two state-of-the-art program spectrum based and one syntactic difference based bug-localization baselines. All the techniques proposed in this thesis are programming language agnostic. We believe that the ideas and tools developed in this work can potentially be a road-map for future attempts at applying deep learning techniques to more problems in software engineering research.	en_US
dc.language.iso	en_US	en_US
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Programming education	en_US
dc.subject	fault localization	en_US
dc.subject	software debugging	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science::Software engineering	en_US
dc.title	Deep Learning for Bug Localization and Program Repair	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Rahul_Gupta-2020_10851.pdf
Size:: 968.5Kb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record