Show simple item record

dc.contributor.advisorNandy, S K
dc.contributor.authorBiswas, Prasenjit
dc.date.accessioned2013-07-10T07:56:16Z
dc.date.accessioned2018-07-31T05:09:04Z
dc.date.available2013-07-10T07:56:16Z
dc.date.available2018-07-31T05:09:04Z
dc.date.issued2013-07-10
dc.date.submitted2011-07
dc.identifier.urihttp://etd.iisc.ac.in/handle/2005/2108
dc.identifier.abstracthttp://etd.iisc.ac.in/static/etd/abstracts/2705/G24895-Abs.pdfen_US
dc.description.abstractApplication domains such as Bio-informatics, DSP, Structural Biology, Fluid Dynamics, high resolution direction finding, state estimation, adaptive noise cancellation etc. demand high performance computing solutions for their simulation environments. The core computations of these applications are in Numerical Linear Algebra (NLA) kernels. Direct solvers are predominantly required in the domains like DSP, estimation algorithms like Kalman Filter etc, where the matrices on which operations need to be performed are either small or medium sized, but dense. Faddeev's Algorithm is often used for solving dense linear system of equations. Modified Faddeev's algorithm (MFA) is a general algorithm on which LU decomposition, QR factorization or SVD of matrices can be realized. MFA has the good property of realizing a host of matrix operations by computing the Schur complements on four blocked matrices, thereby reducing the overall computation requirements. We will use MFA as a representative Direct Solver in this work. We further discuss Given's rotation based QR algorithm for Decomposition of any matrix, often used to solve the linear least square problem. Systolic Array Architectures are widely accepted ASIC solutions for NLA algorithms. But the \can of worms" associated with this traditional solution spawns the need for alternative solutions. While popular custom hardware solution in form of systolic arrays can deliver high performance, but because of their rigid structure they are not scalable and reconfigurable, and hence not commercially viable. We show how a Reconfigurable computing platform can serve to contain the \can of worms". REDEFINE, a coarse grained runtime reconfigurable architecture has been used for systolic actualization of NLA kernels. We elaborate upon streaming NLA-specific enhancements to REDEFINE in order to meet expected performance goals. We explore the need for an algorithm aware custom compilation framework. We bring about a proposition to realize Faddeev's Algorithm on REDEFINE. We show that REDEFINE performs several times faster than traditional GPPs. Further we direct our interest to QR Decomposition to be the next NLA kernel as it ensures better stability than LU and other decompositions. We use QR Decomposition as a case study to explore the design space of the proposed solution on REDEFINE. We also investigate the architectural details of the Custom Functional Units (CFU) for these NLA kernels. We determine the right size of the sub-array in accordance with the optimal pipeline depth of the core execution units and the number of such units to be used per sub-array. The framework used to realize QR Decomposition can be generalized for the realization of other algorithms dealing with decompositions like LU, Faddeev's Algorithm, Gauss-Jordon etc with different CFU definitions .en_US
dc.language.isoen_USen_US
dc.relation.ispartofseriesG24895en_US
dc.subjectComputer Architectureen_US
dc.subjectSystolic Algorithmsen_US
dc.subjectREDEFINEen_US
dc.subjectNumerical Linear Algebra Kernelsen_US
dc.subjectNLA Kernelsen_US
dc.subjectCustom Functional Units (CFU)en_US
dc.subject.classificationComputer Scienceen_US
dc.titleHardware Consolidation Of Systolic Algorithms On A Coarse Grained Runtime Reconfigurable Architectureen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.disciplineFaculty of Engineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record