Performance analysis of methods that overcome false sharing effects in software DSMs
Abstract
Software Distributed Shared Memory (DSM) systems, which rely on virtual memory mechanisms to detect accesses to shared locations and maintain their consistency, support a sharing granularity of a page size, which is of the order of a few kilobytes. This inherent coarse-grain sharing granularity induces high degrees of false sharing, especially in applications with fine-grain access patterns. The overheads due to false sharing are recognized to be the dominant factor limiting the performance of software DSMs.
Several methods have been proposed in the literature to reduce/eliminate false sharing, which broadly follow one of the two approaches, viz., the Multiple Writer approach and the emulated fine-grain sharing (EmFiGS) approach. However, there is no quantitative performance comparison of these methods.
In this thesis, we first present a novel implementation-independent analysis which uses overhead counts to compare the two approaches. Our analysis, by accounting only for those overheads which are intrinsic to the method and not those which are specific to an implementation, shows that in the EmFiGS approach the benefits gained by eliminating false sharing are far outweighed by the performance penalty incurred due to reduced exploitation of spatial locality. As a consequence, any implementation of the EmFiGS approach is likely to perform significantly worse than the Multiple Writer approach.
We then use experimental evaluation to validate and complement our analysis. The measured values of overhead counts match closely with those obtained from our analysis. Also, the execution time results indicate that the EmFiGS approach performs worse than the Multiple Writer approach by a factor of 1.5 to as much as 90 times. In many cases, the EmFiGS approach performs worse than even a single-writer lazy release consistent protocol which experiences very high overheads due to false sharing. Our performance results establish that high true sharing overheads that stem from reduced exploitation of spatial locality cause poor performance of the EmFiGS approach.
The performance of the EmFiGS approach remains worse than the Multiple Writer approach even after incorporating Tapeworm - a record and replay technique that fetches pages ahead of demand in an aggregated fashion - to alleviate the spatial locality effect. Finally, we investigate the interplay between spatial locality exploitation and false sharing elimination with varying sharing granularities in the EmFiGS approach and report the trade-offs.

