Show simple item record

dc.contributor.advisorJacob, Matthew T
dc.contributor.authorMorkhande, Rahul Raj Kumar
dc.date.accessioned2021-10-21T05:09:11Z
dc.date.available2021-10-21T05:09:11Z
dc.date.submitted2018
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5453
dc.description.abstractGPGPUs have emerged as high-performance computing platforms and are used for boosting the performance of general non-graphics applications from various scientifi c domains. These applications span varied areas like social networks, defense, bioinformatics, data science, med- ical imaging, uid dynamics, etc [3]. In order to efficiently exploit the computing potential of these accelerators, the application should be well mapped to the underlying architecture. As a result, different characteristics of behaviors can be seen in applications running on GPG- PUs. Applications are characterized as regular or irregular based on their behavior. Regular applications typically operate on array-like data structures whose run-time behavior can be statically predicted whereas irregular applications operate on pointer-based data structures like graphs, trees etc. [2]. Irregular applications are generally characterized by the presence of high degree of data-dependent control flow and memory accesses. In the literature, we nd various efforts to characterize such applications, particularly the irregular ones which exhibit behavior that results in run-time bottlenecks. Burtscher et al [2] investigated various irregular GPGPU applications by quantifying control flow and memory access behaviors on a real GPU device. Molly et al. [4] analyzed performance aspects of these behaviors on a cycle-accurate GPU simulator [1]. Qiumin Xu et al. [5] studied execution characteristics of graph-based applications on GPGPUs. All of these works focused on characterizing the di- vergences at the kernel level but not at the thread level. In this work, we provide an in-depth characterization of three divergences resulting from 1) Workload distribution, 2) Memory access and 3) Control- flow behaviors at different levels of the GPU thread hierarchy with the purpose of analyzing and quantifying the divergence characteristics at warp, thread block, and kernel level. In Chapter 1, we review certain aspects of CPUs, GPUs and how they are different from each other. Then we discuss various characteristics of GPGPU applications. In Chapter 2, we provide background on GPU architectures, CUDA programming models, and GPUs SIMD execution model. We briefly explain key programming concepts of CUDA like GPUs thread hierarchy and different addressable memory spaces. We describe various behaviors that cause divergence across the parallel threads. We then review the related work in the context of divergence we studied in this work followed by this thesis contribution. In Chapter 3, we explain our methodology for quantifying the workload and branch divergence across the threads at various levels of thread organization. We then present our characterization methodology to quantify divergent aspects of memory instructions. In Chapter 4, we present our chosen benchmarks taken from various suites and show the baseline GPGPU-Sim con g- fiuration we used for evaluating our methodology. Then we discuss our characterization results for workload and branch divergence at warp, thread-block and kernel level for some interest- ing kernels of applications. We examine graph-based application divergence behaviors and show how it varies across threads. We present our characterization for memory access be- haviors of irregular applications using instruction classi fication based on spatial locality. We then discuss the relationship between the throughput and divergence measures by studying their correlation coefficients. To summarize, we quantifi ed and analyzed the control- flow and workload divergence across the threads at warp, thread-block and kernel level for a diverse collection of 12 GPGPU applications which exhibit both regular and irregular behaviors. By using thread's hardware utilization efficiency and a measure we call `Average normalized instructions per thread', we quantify branch and workload divergence respectively. Our characterization technique for memory divergence classi es memory instructions into four different groups based on the property of intra-warp spatial locality of instructions. We then quantify the impact of memory divergence using the behavior of GPU L1 data cache.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G29424
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectGPGPUen_US
dc.subjectAverage normalized instructions per threaden_US
dc.subjectCPUen_US
dc.subjectGPUen_US
dc.subjectgraph-based applicationen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleCharacterization of Divergence resulting from Workload, Memory and Control-Flow behavior in GPGPU Applicationsen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record