Show simple item record

dc.contributor.advisorNandy, S K
dc.contributor.authorKulkarni, Pratik
dc.date.accessioned2022-07-27T04:32:58Z
dc.date.available2022-07-27T04:32:58Z
dc.date.submitted2021
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5800
dc.description.abstractMatrix-matrix multiplication is an important operation for many applications and hence it is required to be parallelized optimally for the architecture the applications will run on. REDE- FINE is a many-core co-processor and a high-performance implementation of matrix-matrix multiplication requires utilizing the different memory layers to amortize the movement between them with useful computation. This thesis presents an approach aimed at achieving this goal along with distributing load between the nodes of REDEFINE to achieve good load-balancing. The approach described in the thesis first makes use of the elemental distribution of matrices which apart from achieving good load-balancing also decouples the storage blocking size and the algorithmic blocking size by setting the storage blocking size to 1. Then the BLIS framework which breaks the inner kernel of the GotoBLAS GEMM implementation into two additional loops around a micro-kernel is employed on every node. A modified eSUMMA2D-C algorithm enables using the elemental distribution of matrices with the BLIS framework. The BLIS framework not only allows for exploiting the memory layers of REDEFINE to amortize the cost of moving data between the memory layers but also provides with opportunities for parallelism within the inner kernel of the GotoBLAS implementation. The fork-join model has been used for parallelization as well as for synchronization. The values of the blocking parameters have been chosen such that they will enable optimal amorti- zation of moving data between memory layers with computation and also achieve a high degree of parallelism from the loops being parallelized. Thus, this approach integrates the elemental distribution of matrices with the BLIS framework to achieve good load-balancing between the nodes of REDEFINE along with optimal amortization of moving elements of matrices between the different memory layers with computation and a high degree of parallelization.en_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectMatrix Multiplicationen_US
dc.subjectHigh-performance Computingen_US
dc.subjectComputer Architectureen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleOptimizing Matrix Multiplication for the REDEFINE Many-Core Co-processoren_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record