• Login
    View Item 
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Characterization of Divergence resulting from Workload, Memory and Control-Flow behavior in GPGPU Applications

    View/Open
    Thesis full text (3.281Mb)
    Author
    Morkhande, Rahul Raj Kumar
    Metadata
    Show full item record
    Abstract
    GPGPUs have emerged as high-performance computing platforms and are used for boosting the performance of general non-graphics applications from various scientifi c domains. These applications span varied areas like social networks, defense, bioinformatics, data science, med- ical imaging, uid dynamics, etc [3]. In order to efficiently exploit the computing potential of these accelerators, the application should be well mapped to the underlying architecture. As a result, different characteristics of behaviors can be seen in applications running on GPG- PUs. Applications are characterized as regular or irregular based on their behavior. Regular applications typically operate on array-like data structures whose run-time behavior can be statically predicted whereas irregular applications operate on pointer-based data structures like graphs, trees etc. [2]. Irregular applications are generally characterized by the presence of high degree of data-dependent control flow and memory accesses. In the literature, we nd various efforts to characterize such applications, particularly the irregular ones which exhibit behavior that results in run-time bottlenecks. Burtscher et al [2] investigated various irregular GPGPU applications by quantifying control flow and memory access behaviors on a real GPU device. Molly et al. [4] analyzed performance aspects of these behaviors on a cycle-accurate GPU simulator [1]. Qiumin Xu et al. [5] studied execution characteristics of graph-based applications on GPGPUs. All of these works focused on characterizing the di- vergences at the kernel level but not at the thread level. In this work, we provide an in-depth characterization of three divergences resulting from 1) Workload distribution, 2) Memory access and 3) Control- flow behaviors at different levels of the GPU thread hierarchy with the purpose of analyzing and quantifying the divergence characteristics at warp, thread block, and kernel level. In Chapter 1, we review certain aspects of CPUs, GPUs and how they are different from each other. Then we discuss various characteristics of GPGPU applications. In Chapter 2, we provide background on GPU architectures, CUDA programming models, and GPUs SIMD execution model. We briefly explain key programming concepts of CUDA like GPUs thread hierarchy and different addressable memory spaces. We describe various behaviors that cause divergence across the parallel threads. We then review the related work in the context of divergence we studied in this work followed by this thesis contribution. In Chapter 3, we explain our methodology for quantifying the workload and branch divergence across the threads at various levels of thread organization. We then present our characterization methodology to quantify divergent aspects of memory instructions. In Chapter 4, we present our chosen benchmarks taken from various suites and show the baseline GPGPU-Sim con g- fiuration we used for evaluating our methodology. Then we discuss our characterization results for workload and branch divergence at warp, thread-block and kernel level for some interest- ing kernels of applications. We examine graph-based application divergence behaviors and show how it varies across threads. We present our characterization for memory access be- haviors of irregular applications using instruction classi fication based on spatial locality. We then discuss the relationship between the throughput and divergence measures by studying their correlation coefficients. To summarize, we quantifi ed and analyzed the control- flow and workload divergence across the threads at warp, thread-block and kernel level for a diverse collection of 12 GPGPU applications which exhibit both regular and irregular behaviors. By using thread's hardware utilization efficiency and a measure we call `Average normalized instructions per thread', we quantify branch and workload divergence respectively. Our characterization technique for memory divergence classi es memory instructions into four different groups based on the property of intra-warp spatial locality of instructions. We then quantify the impact of memory divergence using the behavior of GPU L1 data cache.
    URI
    https://etd.iisc.ac.in/handle/2005/5453
    Collections
    • Department of Computational and Data Sciences (CDS) [100]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV