Show simple item record

dc.contributor.advisorBondhugula, Uday
dc.contributor.authorPananilath, Irshad Muhammed
dc.date.accessioned2018-03-09T06:54:29Z
dc.date.accessioned2018-07-31T04:38:57Z
dc.date.available2018-03-09T06:54:29Z
dc.date.available2018-07-31T04:38:57Z
dc.date.issued2018-03-09
dc.date.submitted2014
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/3259
dc.identifier.abstracthttp://etd.iisc.ac.in/static/etd/abstracts/4120/G26635-Abs.pdfen_US
dc.description.abstractLattice-Boltzmann method(LBM), a promising new particle-based simulation technique for complex and multiscale fluid flows, has seen tremendous adoption in recent years in computational fluid dynamics. Even with a state-of-the-art LBM solver such as Palabos, a user still has to manually write his program using the library-supplied primitives. We propose an automated code generator for a class of LBM computations with the objective to achieve high performance on modern architectures. Tiling is a very important loop transformation used to improve the performance of stencil computations by exploiting locality and parallelism. In the first part of the work, we explore diamond tiling, a new tiling technique to exploit the inherent ability of most stencils to allow tile-wise concurrent start. This enables perfect load-balance during execution and reduces the frequency of synchronization required. Few studies have looked at time tiling for LBM codes. We exploit a key similarity between stencils and LBM to enable polyhedral optimizations and in turn time tiling for LBM. Besides polyhedral transformations, we also describe a number of other complementary transformations and post processing necessary to obtain good parallel and SIMD performance on modern architectures. We also characterize the performance of LBM with the Roofline performance model. Experimental results for standard LBM simulations like Lid Driven Cavity, Flow Past Cylinder, and Poiseuille Flow show that our scheme consistently outperforms Palabos–on average by3 x while running on 16 cores of a n Intel Xeon Sandy bridge system. We also obtain a very significant improvement of 2.47 x over the native production compiler on the SPECLBM benchmark.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesG26635en_US
dc.subjectLattice-Boltzmann Computationsen_US
dc.subjectComputational Fluid Dynamicsen_US
dc.subjectTiling Stencil Computationsen_US
dc.subjectSingle Instruction Multiple Data (SIMD)en_US
dc.subjectParallel Computersen_US
dc.subjectParallel Processingen_US
dc.subjectLoop Transformationsen_US
dc.subjectLattice-Boltzman Method (LBM)en_US
dc.subjectLattice Boltzman Methoden_US
dc.subjectLattice-Boltzmann Equationen_US
dc.subject.classificationComputer Scienceen_US
dc.titleAn Optimizing Code Generator for a Class of Lattice-Boltzmann Computationsen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.disciplineFaculty of Engineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record