dc.contributor.advisor | Nandy, S K | |
dc.contributor.author | Kulkarni, Pratik | |
dc.date.accessioned | 2022-07-27T04:32:58Z | |
dc.date.available | 2022-07-27T04:32:58Z | |
dc.date.submitted | 2021 | |
dc.identifier.uri | https://etd.iisc.ac.in/handle/2005/5800 | |
dc.description.abstract | Matrix-matrix multiplication is an important operation for many applications and hence it is
required to be parallelized optimally for the architecture the applications will run on. REDE-
FINE is a many-core co-processor and a high-performance implementation of matrix-matrix
multiplication requires utilizing the different memory layers to amortize the movement between
them with useful computation. This thesis presents an approach aimed at achieving this goal
along with distributing load between the nodes of REDEFINE to achieve good load-balancing.
The approach described in the thesis first makes use of the elemental distribution of matrices
which apart from achieving good load-balancing also decouples the storage blocking size and the
algorithmic blocking size by setting the storage blocking size to 1. Then the BLIS framework
which breaks the inner kernel of the GotoBLAS GEMM implementation into two additional
loops around a micro-kernel is employed on every node. A modified eSUMMA2D-C algorithm
enables using the elemental distribution of matrices with the BLIS framework. The BLIS
framework not only allows for exploiting the memory layers of REDEFINE to amortize the cost
of moving data between the memory layers but also provides with opportunities for parallelism
within the inner kernel of the GotoBLAS implementation.
The fork-join model has been used for parallelization as well as for synchronization. The
values of the blocking parameters have been chosen such that they will enable optimal amorti-
zation of moving data between memory layers with computation and also achieve a high degree
of parallelism from the loops being parallelized. Thus, this approach integrates the elemental
distribution of matrices with the BLIS framework to achieve good load-balancing between the
nodes of REDEFINE along with optimal amortization of moving elements of matrices between
the different memory layers with computation and a high degree of parallelization. | en_US |
dc.language.iso | en_US | en_US |
dc.rights | I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part
of this thesis or dissertation | en_US |
dc.subject | Matrix Multiplication | en_US |
dc.subject | High-performance Computing | en_US |
dc.subject | Computer Architecture | en_US |
dc.subject.classification | Research Subject Categories::TECHNOLOGY::Information technology::Computer science | en_US |
dc.title | Optimizing Matrix Multiplication for the REDEFINE Many-Core Co-processor | en_US |
dc.type | Thesis | en_US |
dc.degree.name | MTech (Res) | en_US |
dc.degree.level | Masters | en_US |
dc.degree.grantor | Indian Institute of Science | en_US |
dc.degree.discipline | Engineering | en_US |