• Login
    View Item 
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Supercomputer Education and Research Centre (SERC)
    • View Item
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Supercomputer Education and Research Centre (SERC)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Automatic Compilation Of MATLAB Programs For Synergistic Execution On Heterogeneous Processors

    View/Open
    G25101.pdf (899.8Kb)
    Date
    2014-05-19
    Author
    Prasad, Ashwin
    Metadata
    Show full item record
    Abstract
    MATLAB is an array language, initially popular for rapid prototyping, but is now being in-creasingly used to develop production code for numerical and scientific applications. Typical MATLAB programs have abundant data parallelism. These programs also have control flow dominated scalar regions that have an impact on the program’s execution time. Today’s com-puter systems have tremendous computing power in the form of traditional CPU cores and also throughput-oriented accelerators such as graphics processing units (GPUs). Thus, an approach that maps the control flow dominated regions of a MATLAB program to the CPU and the data parallel regions to the GPU can significantly improve program performance. In this work, we present the design and implementation of MEGHA, a compiler that auto-matically compiles MATLAB programs to enable synergistic execution on heterogeneous pro-cessors. Our solution is fully automated and does not require programmer input for identifying data parallel regions. Our compiler identifies data parallel regions of the program and com-poses them into kernels. The kernel composition step eliminates a number of intermediate arrays which are otherwise required and also reduces the size of the scheduling and mapping problem the compiler needs to solve subsequently. The problem of combining statements into kernels is formulated as a constrained graph clustering problem. Heuristics are presented to map identified kernels to either the CPU or GPU so that kernel execution on the CPU and the GPU happens synergistically, and the amount of data transfer needed is minimized. A heuristic technique to ensure that memory accesses on the CPU exploit locality and those on the GPU are coalesced is also presented. In order to ensure that data transfers required for dependences across basic blocks are performed, we propose a data flow analysis step and an edge-splitting strategy. Thus our compiler automatically handles kernel composition, mapping of kernels to CPU and GPU, scheduling and insertion of required data transfers. Additionally, we address the problem of identifying what variables can coexist in GPU memory simultaneously under the GPU memory constraints. We formulate this problem as that of identifying maximal cliques in an interference graph. We approximate the interference graph using an interval graph and develop an efficient algorithm to solve the problem. Furthermore, we present two program transformations that optimize memory accesses on the GPU using the software managed scratchpad memory available in GPUs. We have prototyped the proposed compiler using the Octave system. Our experiments using this implementation show a geometric mean speedup of 12X on the GeForce 8800 GTS and 29.2X on the Tesla S1070 over baseline MATLAB execution for data parallel benchmarks. Experiments also reveal that our method provides up to 10X speedup over hand written GPUmat versions of the benchmarks. Our method also provides a speedup of 5.3X on the GeForce 8800 GTS and 13.8X on the Tesla S1070 compared to compiled MATLAB code running on the CPU.
    URI
    https://etd.iisc.ac.in/handle/2005/2312
    Collections
    • Supercomputer Education and Research Centre (SERC) [98]

    Related items

    Showing items related by title, author, creator and subject.

    • Access Path Based Dataflow Analysis For Sequential And Concurrent Programs 

      Arnab De, * (2016-09-14)
      In this thesis, we have developed a flow-sensitive data flow analysis framework for value set analyses for Java-like languages. Our analysis frame work is based on access paths—a variable followed by zero or more field ...
    • Approximate Dynamic Programming and Reinforcement Learning - Algorithms, Analysis and an Application 

      Lakshminarayanan, Chandrashekar (2018-08-13)
      Problems involving optimal sequential making in uncertain dynamic systems arise in domains such as engineering, science and economics. Such problems can often be cast in the framework of Markov Decision Process (MDP). ...
    • A Runtime Framework for Regular and Irregular Message-Driven Parallel Applications on GPU Systems 

      Rengasamy, Vasudevan (2018-02-28)
      The effective use of GPUs for accelerating applications depends on a number of factors including effective asynchronous use of heterogeneous resources, reducing data transfer between CPU and GPU, increasing occupancy of ...

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV