Modelling program dynamics : A study on locality of reference
Abstract
Locality of reference is a phenomenon exhibited by programs and computer systems in general, and refers to the tendency to localize operations to a small subset of the available resources. In the past, this phenomenon has been studied in terms of temporal and spatial locality; studies of locality have been restricted to simple considerations that can be readily exploited to gain performance improvements. The objective of this thesis is to provide a general framework based on which locality studies can be conducted at various levels of detail, and investigate how this view can lead to improvements in design and evaluation methodology.
The principle of locality provides the basic framework which we generalize to a General Principle of Locality (GPL). Our GPL asserts that in a system displaying locality of reference:
(i) there will be non-uniform demand for resources,
(ii) the demand patterns will be correlated over time, and
(iii) if the demand pattern becomes uncorrelated over an interval, this signifies a transition in locality.
We classify resources as intrinsic or extrinsic, depending on the level at which the study is performed, and show that spatial locality is merely an artifact of the mapping between intrinsic and extrinsic resources. Based on the General Principle of Locality and our classification of resources, we propose necessary conditions that must be satisfied by a model of locality to be representative; we require that a representative model be able to accurately predict performance on various direct-mapped cache configurations. Further, none of the published models of program memory behaviour, when evaluated based on our necessary conditions, are found to be representative.
We introduce the concept of dynamic data clusters, which are formed by partitioning a data reference stream. Dynamic data clusters are used to identify locality phases in a program. A variation of the procedure used to cluster data addresses forms the basis for the design of a hardware unit for prefetching structural references in a program. Our evaluation of this structural prefetch scheme shows that it can reduce miss ratios by as much as 50%. Performance improves by as much as 75% when this scheme is used along with a strided prefetch scheme.
We developed a representative locality model for memory behaviour in a program phase, based on the GPL necessary conditions. The model was validated by comparing the cache miss ratios of a synthetically generated trace based on the model against the miss ratios for the corresponding phase of the actual program, across various direct-mapped cache configurations. The design of caches to meet a target performance is used to illustrate an application of our locality model.
The processing of the LRU stack, used in our model, is known to suffer from performance problems. We present four algorithms to speed up LRU stack processing and show that one of them, the Successive Depth Approximation algorithm, improves performance by as much as 50% over the best published algorithm which uses a splay tree to represent the stack.

