Heterogeneity Aware Shared DRAM Cache for Integrated Heterogeneous Architectures

Patil, Adarsh

dc.contributor.advisor	Govindarajan, R
dc.contributor.author	Patil, Adarsh
dc.date.accessioned	2018-10-26T09:41:25Z
dc.date.available	2018-10-26T09:41:25Z
dc.date.submitted	2017
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4124
dc.description.abstract	Integrated Heterogeneous System (IHS) processors pack throughput-oriented GPGPUs along-side latency-oriented CPUs on the same die sharing certain resources, e.g., shared last level cache, network-on-chip (NoC), and the main memory. They also share virtual and physical address spaces and unify the memory hierarchy. The IHS architecture allows for easier programmability, data management and efficiency. However, the significant disparity in the demands for memory and other shared resources between the GPU cores and CPU cores poses significant problems in exploiting the full potential of this architecture. In this work, we propose adding a large capacity stacked DRAM, used as a shared last level cache, for the IHS processors. The reduced latency of access and large bandwidth provided by the DRAM cache can help improve performance respectively of CPU and GPGPU while the large capacity can help contain the working set of the IHS workloads. However, adding the DRAM cache naively leaves significant performance on the table due to the disparate demands from CPU and GPU cores for DRAM cache and memory accesses. In particular, the imbalance can significantly reduce the performance benefits that the CPU cores would have otherwise enjoyed with the introduction of the DRAM cache. This necessitates a heterogeneity-aware management of this shared resource for improved performance. To address this, in this thesis, we propose three simple techniques to enhance the performance of CPU application while ensuring very little or no performance impact to the GPU. Specifically, we propose (i) PrIS, a prioritization scheme for scheduling CPU requests at the DRAM cache controller, (ii) ByE, a selective and temporal bypassing scheme for CPU requests at the DRAM cache and (iii) Chaining, an occupancy controlling mechanism for GPU lines in the DRAM cache through pseudoassociativity. The resulting cache, HAShCache, is heterogeneity-aware and can adapt dynamically to address the inherent disparity of demands in an IHS architecture with simple light weight schemes. We enhance the gem5-gpu simulator to model an IHS architecture with stacked DRAM as a cache, coherent GPU L2 cache and CPU caches and a shared unified physical memory. Using this setup we perform detailed experimental evaluation of the proposed HAShCache and demonstrate an average system performance (combined performance of CPU and GPU cores) improvement of 41% over a naive DRAM cache and over 100% improvement over a baseline system with no stacked DRAM cache.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G28533;
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Algorithms	en_US
dc.subject	Integrated Heterogeneous Systems	en_US
dc.subject	DRAM Cache	en_US
dc.subject	HAShCache	en_US
dc.subject	HAShCache	en_US
dc.subject	Row Buffer Hits (RBH)	en_US
dc.subject	Bank Level Parallelism (BLP)	en_US
dc.subject	IHS Architecture	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Heterogeneity Aware Shared DRAM Cache for Integrated Heterogeneous Architectures	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G28533-Abs.pdf
Size:: 61.97Kb
Format:: PDF
Description:: Thesis(Abstract)

View/Open

Name:: G28533.pdf
Size:: 1.498Mb
Format:: PDF
Description:: Thesis(Full Text)

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record