An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations

Pananilath, Irshad Muhammed

dc.contributor.advisor	Bondhugula, Uday
dc.contributor.author	Pananilath, Irshad Muhammed
dc.date.accessioned	2018-03-09T06:54:29Z
dc.date.accessioned	2018-07-31T04:38:57Z
dc.date.available	2018-03-09T06:54:29Z
dc.date.available	2018-07-31T04:38:57Z
dc.date.issued	2018-03-09
dc.date.submitted	2014
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3259
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/4120/G26635-Abs.pdf	en_US
dc.description.abstract	Lattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G26635	en_US
dc.subject	Lattice-Boltzmann Computations	en_US
dc.subject	Computational Fluid Dynamics	en_US
dc.subject	Tiling Stencil Computations	en_US
dc.subject	Single Instruction Multiple Data (SIMD)	en_US
dc.subject	Parallel Computers	en_US
dc.subject	Parallel Processing	en_US
dc.subject	Loop Transformations	en_US
dc.subject	Lattice-Boltzman Method (LBM)	en_US
dc.subject	Lattice Boltzman Method	en_US
dc.subject	Lattice-Boltzmann Equation	en_US
dc.subject.classification	Computer Science	en_US
dc.title	An Optimizing Code Generator for a Class of Lattice-Boltzmann Computations	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US