Tiling Stencil Computations To Maximize Parallelism

Bandishti, Vinayaka Prakasha

dc.contributor.advisor	Bondhugula, Uday
dc.contributor.author	Bandishti, Vinayaka Prakasha
dc.date.accessioned	2017-05-21T11:42:33Z
dc.date.accessioned	2018-07-31T04:38:36Z
dc.date.available	2017-05-21T11:42:33Z
dc.date.available	2018-07-31T04:38:36Z
dc.date.issued	2017-05-21
dc.date.submitted	2013
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/2619
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/3407/G26301-Abs.pdf	en_US
dc.description.abstract	Stencil computations are iterative kernels often used to simulate the change in a discretized spatial domain overtime (e.g., computational fluid dynamics) or to solve for unknowns in a discretized space by converging to a steady state (i.e., partial differential equations).They are commonly found in many scientific and engineering applications. Most stencil computations allow tile-wise concurrent start ,i.e., there exists a face of the iteration space and a set of tiling hyper planes such that all tiles along that face can be started concurrently. This provides load balance and maximizes parallelism. Loop tiling is a key transformation used to exploit both data locality and parallelism from stencils simultaneously. Numerous works exist that target improving locality, controlling frequency of synchronization, and volume of communication wherever applicable. But, concurrent start-up of tiles that evidently translates into perfect load balance and often reduction in frequency of synchronization is completely ignored. Existing automatic tiling frameworks often choose hyperplanes that lead to pipelined start-up and load imbalance. We address this issue with a new tiling technique that ensures concurrent start-up as well as perfect load balance whenever possible. We ﬁrst provide necessary and sufficient conditions on tiling hyperplanes to enable concurrent start for programs with affine data accesses. We then discuss an iterative approach to find such hyperplanes. It is not possible to directly apply automatic tiling techniques to periodic stencils because of the wrap-around dependences in them. To overcome this, we use iteration space folding techniques as a pre-processing stage after which our technique can be applied without any further change. We have implemented our techniques on top of Pluto-a source-level automatic parallelizer. Experimental evaluation on a 12-core Intel Westmere shows that our code is able to outperform a tuned domain-speciﬁc stencil code generator by 4% to2 x, and previous compiler techniques by a factor of 1.5x to 15x. For the swim benchmark from SPECFP2000, we achieve an .improvement of 5.12 x on a 12-core Intel Westmere and 2.5x on a 16-core AMD Magny-Cours machines, over the auto-parallelizer of Intel C Compiler.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G26301	en_US
dc.subject	Stencil Computations	en_US
dc.subject	Concurrent Start-Up	en_US
dc.subject	Tiling Hyperplanes	en_US
dc.subject	Periodic Stencils	en_US
dc.subject	Compilers (Computer Programs)	en_US
dc.subject	Multiprocessors	en_US
dc.subject	Computer Architecture	en_US
dc.subject	Parallelism (Computer Architecture)	en_US
dc.subject	Tiling Stencil Computations	en_US
dc.subject	Automatic Parallelizers	en_US
dc.subject	Pluto-Source Level Automatic Parallelizer	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Tiling Stencil Computations To Maximize Parallelism	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US