Precise Analysis of Private And Shared Caches for Tight WCET Estimates
Abstract
Worst Case Execution Time (WCET) is an important metric for programs running on real-time systems, and finding precise estimates of a program’s WCET is crucial to avoid over-allocation and wastage of hardware resources and to improve the schedulability of task sets. Hardware Caches have a major impact on a program’s execution time, and accurate estimation of a program’s cache behavior generally leads to significant reduction of its estimated WCET. However, the cache behavior of an access cannot be determined in isolation, since it depends on the access history, and in multi-path programs, the sequence of accesses made to the cache is not fixed. Hence, the same access can exhibit different cache behavior in different execution instances. This issue is further exacerbated in shared caches in a multi-core architecture, where interfering accesses from co-running programs on other cores can arrive at any time and modify the cache state. Further, cache analysis aimed towards WCET estimation should be provably safe, in that the estimated WCET should always exceed the actual execution time across all execution instances.
Faced with such contradicting requirements, previous approaches to cache analysis try to find memory accesses in a program which are guaranteed to hit the cache, irrespective of the program input, or the interferences from other co-running programs in case of a shared cache. To do so, they find the worst-case cache behavior for every individual memory access, analyzing the program (and interferences to a shared cache) to find whether there are execution instances where an access can super a cache miss. However, this approach loses out in making more precise predictions of private cache behavior which can be safely used for WCET estimation, and is significantly imprecise for shared cache analysis, where it is often impossible to guarantee that an access always hits the cache. In this work, we take a fundamentally different approach to cache analysis, by (1) trying to find worst-case behavior of groups of cache accesses, and (2) trying to find the exact cache behavior in the worst-case program execution instance, which is the execution instance with the maximum execution time.
For shared caches, we propose the Worst Case Interference Placement (WCIP) technique, which finds the worst-case timing of interfering accesses that would cause the maximum number of cache misses on the worst case execution path of the program. We first use Integer Linear Programming (ILP) to find an exact solution to the WCIP problem. However, this approach does not scale well for large programs, and so we investigate the WCIP problem in detail and prove that it is NP-Hard.
In the process, we discover that the source of hardness of the WCIP problem lies in finding the worst case execution path which would exhibit the maximum execution time in the presence of interferences. We use this observation to propose an approximate algorithm for performing WCIP, which bypasses the hard problem of finding the worst case execution path by simply assuming that all cache accesses made by the program occur on a single path. This allows us to use a simple greedy algorithm to distribute the interfering accesses by choosing those cache accesses which could be most affected by interferences. The greedy algorithm also guarantees that the increase in WCET due to interferences is linear in the number of interferences. Experimentally, we show that WCIP provides substantial precision improvement in the final WCET over previous approaches to shared cache analysis, and the approximate algorithm almost matches the precision of the ILP-based approach, while being considerably faster.
For private caches, we discover multiple scenarios where hit-miss predictions made by traditional Abstract Interpretation-based approaches are not sufficient to fully capture cache behavior for WCET estimation. We introduce the concept of cache miss paths, which are abstractions of program path along which an access can super a cache miss. We propose an ILP-based approach which uses cache miss paths to find the exact cache behavior in the worst-case execution instance of the program. However, the ILP-based approach needs information about the worst-case execution path to predict the cache behavior, and hence it is difficult to integrate it with other micro-architectural analysis. We then show that most of the precision improvement of the ILP-based approach can be recovered without any knowledge of the worst-case execution path, by a careful analysis of the cache miss paths themselves. In particular, we can use cache miss paths to find the worst-case behavior of groups of cache accesses. Further, we can find upper bounds on the maximum number of times that cache accesses inside loops can exhibit worst-case behavior. This results in a scalable, precise method for performing private cache analysis which can be easily integrated with other micro-architectural analysis.