• Login
    View Item 
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Optimizing Matrix Multiplication for the REDEFINE Many-Core Co-processor

    View/Open
    Thesis full text (940.3Kb)
    Author
    Kulkarni, Pratik
    Metadata
    Show full item record
    Abstract
    Matrix-matrix multiplication is an important operation for many applications and hence it is required to be parallelized optimally for the architecture the applications will run on. REDE- FINE is a many-core co-processor and a high-performance implementation of matrix-matrix multiplication requires utilizing the different memory layers to amortize the movement between them with useful computation. This thesis presents an approach aimed at achieving this goal along with distributing load between the nodes of REDEFINE to achieve good load-balancing. The approach described in the thesis first makes use of the elemental distribution of matrices which apart from achieving good load-balancing also decouples the storage blocking size and the algorithmic blocking size by setting the storage blocking size to 1. Then the BLIS framework which breaks the inner kernel of the GotoBLAS GEMM implementation into two additional loops around a micro-kernel is employed on every node. A modified eSUMMA2D-C algorithm enables using the elemental distribution of matrices with the BLIS framework. The BLIS framework not only allows for exploiting the memory layers of REDEFINE to amortize the cost of moving data between the memory layers but also provides with opportunities for parallelism within the inner kernel of the GotoBLAS implementation. The fork-join model has been used for parallelization as well as for synchronization. The values of the blocking parameters have been chosen such that they will enable optimal amorti- zation of moving data between memory layers with computation and also achieve a high degree of parallelism from the loops being parallelized. Thus, this approach integrates the elemental distribution of matrices with the BLIS framework to achieve good load-balancing between the nodes of REDEFINE along with optimal amortization of moving elements of matrices between the different memory layers with computation and a high degree of parallelization.
    URI
    https://etd.iisc.ac.in/handle/2005/5800
    Collections
    • Department of Computational and Data Sciences (CDS) [100]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV