Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture

Biswas, Prasenjit

dc.contributor.advisor	Nandy, S K
dc.contributor.author	Biswas, Prasenjit
dc.date.accessioned	2013-07-10T07:56:16Z
dc.date.accessioned	2018-07-31T05:09:04Z
dc.date.available	2013-07-10T07:56:16Z
dc.date.available	2018-07-31T05:09:04Z
dc.date.issued	2013-07-10
dc.date.submitted	2011
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/2108
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/2705/G24895-Abs.pdf	en_US
dc.description.abstract	Application domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G24895	en_US
dc.subject	Computer Architecture	en_US
dc.subject	Systolic Algorithms	en_US
dc.subject	REDEFINE	en_US
dc.subject	Numerical Linear Algebra Kernels	en_US
dc.subject	NLA Kernels	en_US
dc.subject	Custom Functional Units (CFU)	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Hardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architecture	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G24895.pdf
Size:: 767.7Kb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Supercomputer Education and Research Centre (SERC) [116]

Show simple item record