Efficient Whole Program Path Tracing
Abstract
Obtaining an accurate whole program path (WPP) that captures a program’s runtime behaviour in terms of a control-flow trace has a number of well-known benefits, including opportunities for code optimization, bug detection, program analysis refinement, etc. Existing techniques to compute WPPs perform sub-optimal instrumentation resulting in significant space and time overheads. Our goal in this thesis is to minimize these overheads without losing precision.
To do so, we design a novel and scalable whole program analysis to determine instrumentation points used to obtain WPPs. Our approach is divided into three components: (a) an efficient summarization technique for inter-procedural path reconstruction, (b) specialized data structures called conflict sets that serve to effectively distinguish between pairs of paths, and (c) an instrumentation algorithm that computes the minimum number of edges to describe a path based on these conflict sets. We show that the overall problem is a variant of the minimum hitting set problem, which is NP-hard, and employ various sound approximation strategies to yield a practical solution.
We have implemented our approach and performed elaborate experimentation on Java programs from the DaCapo benchmark suite to demonstrate the efficacy of our approach across multiple dimensions. On average, our approach necessitates instrumenting only 9% of the total number of CFG edges in the program. The average runtime overhead incurred by our approach to collect WPPs is 1.97x, which is only 26% greater than the overhead induced by only instrumenting edges guaranteed to exist in an optimal solution. Furthermore, compared to the state-of-the-art, we observe a reduction in runtime overhead by an average and maximum factor of 2.8 and 5.4, respectively.