EMF: System Design and Challenges for Disaggregated GPUs in datacenters for Efficiency, Modularity and Flexibility

Guleria, Anubhav

dc.contributor.advisor	Lakshmi, J
dc.contributor.author	Guleria, Anubhav
dc.date.accessioned	2021-01-27T09:36:04Z
dc.date.available	2021-01-27T09:36:04Z
dc.date.submitted	2020
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4831
dc.description.abstract	With Dennard Scaling phasing out in the mid-2000s, architectural scaling and hardware specialization take centre stage to provide performance bene fits with already stalling Moore's law. An outcome from this hardware specialization is GPU which exploits the Data Level Parallelism in an application. One approach is to augment the existing infrastructure with accelerators like GPUs that cater to data-parallel, throughput centric workloads ranging from AI, HPC to Visualization. Further, the availability of GPUs in Public Cloud offerings has expedited their mass adoption. At the same time these modern cloud-based applications are placing increasing demands on infrastructure in-terms of versatility, performance and e ciency. The high acquisition and operational cost of GPUs necessitate their optimal utilization while avoiding common pitfalls like resource stranding. Disaggregating expensive and power hungry GPUs will enable a cost-e cient and adaptive ecosystem for their deployments. In this work, we rst quantify the gains associated with disaggregated GPU deployments using metrics like failed VM requests and GPU Watt Hours consumption. For this, we use QUADD-SIM, a simulator we built to model, quantify, and contrast di erent facets of these emerging GPU deployments. Using QUADD-SIM we model different VM and resource provisioning aspects of disaggregated GPU deployments. We simulate realistic AI workload requests for a period of 3 months with characteristics derived from recent public datacenter traces. Our results attest that disaggregated GPU deployment strategies outperform traditional GPU deployments in terms of failed VM requests and GPU Watt-hours consumption. We showed through extensive experimentation that 5.14% and 7.90% additional failed VM requests were serviced by disaggregated GPU deployments consuming 10.92% and 3.30% lesser GPU Watt-hours compared to traditional deployment. As our second contribution, we then identify how the disaggregation constructs could be met at different abstraction levels for NVIDIA GPU computing stack. We introduce the notion of Disaggregation Plane to understand the feasibility and limitations of a disaggregated solution We then evaluate various GPU disaggregation solution approaches with Disaggregation plane using the following metrics: 1) Composability, 2) Independent existence, and 3) Backward Compatibility. Based on this analysis, we then propose EMF: a rack-level, open system for GPU Disaggregation. EMF, in addition to supporting core disaggregation constructs (i.e. independent existence and composability) also provides backward compatibility. We highlight some key design abstractions, elements, and some pressing issues in realizing the system while presenting the design of EMF. Lastly, we evaluate performance impact by quantifying worst-case latency overheads due to disaggregation. We model Host device driver and GPU interactions for data transfer operations over PCIe in terms of TLPs to understand the performance impact due to our design. Further evaluation with 6 Deep Learning applications shows that these overheads could vary from 7.6% to 20.2%, justifying the practicality of our design. We found that the latency overheads are directly correlated with the Average Throughput of the application and applications with short-lifetimes having bursty data-transfer characteristics may show visible performance degradation.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G29701
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	GPU	en_US
dc.subject	QUADD-SIM	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology	en_US
dc.title	EMF: System Design and Challenges for Disaggregated GPUs in datacenters for Efficiency, Modularity and Flexibility	en_US
dc.type	Thesis	en_US
dc.degree.name	MTech (Res)	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G29701.pdf
Size:: 3.550Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [102]

Show simple item record