dc.description.abstract | Image processing pipelines are ubiquitous. Every image captured by a camera and every image uploaded on social networks like Google+or Facebook is processed by a pipeline. Applications in a wide range of domains like computational photography, computer vision and medical imaging use image processing pipelines. Many of these applications demand high-performance which requires effective utilization of modern architectures. Given the proliferation of camera enabled devices and social networks optimizing these emerging workloads has become important both at the data center and the embedded device scales.
An image processing pipeline can be viewed as a graph of interconnected stages which process images successively. Each stage typically performs one of point-wise, stencil, sam-pling, reduction or data-dependent operations on image pixels. Individual stages in a pipeline typically exhibit abundant data parallelism that can be exploited with relative ease. However, the stages also require high memory bandwidth preventing effective uti-lization of parallelism available on modern architectures. The traditional options are using optimized libraries like OpenCV or to optimize manually. While using libraries precludes optimization across library routines, manual optimization accounting for both parallelism and locality is very tedious.
Inthisthesis,wepresentthedesignandimplementationofPolyMage,adomain-specific language and compiler for image processing pipelines. The focus of the system is on au-tomatically generating high-performance implementations of image processing pipelines expressed in a high-level declarative language. We achieve such automation with:
• tiling techniques to improve parallelism and locality by introducing redundant computation,
v
a model-driven fusion heuristic which enables a trade-off between locality and re-dundant computations, and anautotuner whichleveragesthefusionheuristictoexploreasmallsubsetofpipeline implementations and find the best performing one.
Our optimization approach primarily relies on the transformation and code generation ca-pabilities of the polyhedral compiler framework. To the best of our knowledge, this is the first model-driven compiler for image processing pipelines that performs complex fusion, tiling, and storage optimization fully automatically. We evaluate our framework on a modern multicore system using a set of seven benchmarks which vary widely in structure and complexity. Experimental results show that the performance of pipeline implementations generated by our approach is:
• up to 1.81× better than pipeline implementations manually tuned using Halide, a state-of-the-art language and compiler for image processing pipelines,
• on average 5.39× better than pipeline implementations automatically tuned using Halide and OpenTuner, and
• on average 3.3× better than naive pipeline implementations which only exploit par-allelism without optimizing for locality.
We also demonstrate that the performance of PolyMage generated code is better or compa-rable to implementations using OpenCV, a state-of-the-art image processing and computer vision library. | en_US |