Show simple item record

dc.contributor.advisorLakshmi, J
dc.contributor.authorGuleria, Anubhav
dc.date.accessioned2021-01-27T09:36:04Z
dc.date.available2021-01-27T09:36:04Z
dc.date.submitted2020
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/4831
dc.description.abstractWith Dennard Scaling phasing out in the mid-2000s, architectural scaling and hardware specialization take centre stage to provide performance bene fits with already stalling Moore's law. An outcome from this hardware specialization is GPU which exploits the Data Level Parallelism in an application. One approach is to augment the existing infrastructure with accelerators like GPUs that cater to data-parallel, throughput centric workloads ranging from AI, HPC to Visualization. Further, the availability of GPUs in Public Cloud offerings has expedited their mass adoption. At the same time these modern cloud-based applications are placing increasing demands on infrastructure in-terms of versatility, performance and e ciency. The high acquisition and operational cost of GPUs necessitate their optimal utilization while avoiding common pitfalls like resource stranding. Disaggregating expensive and power hungry GPUs will enable a cost-e cient and adaptive ecosystem for their deployments. In this work, we rst quantify the gains associated with disaggregated GPU deployments using metrics like failed VM requests and GPU Watt Hours consumption. For this, we use QUADD-SIM, a simulator we built to model, quantify, and contrast di erent facets of these emerging GPU deployments. Using QUADD-SIM we model different VM and resource provisioning aspects of disaggregated GPU deployments. We simulate realistic AI workload requests for a period of 3 months with characteristics derived from recent public datacenter traces. Our results attest that disaggregated GPU deployment strategies outperform traditional GPU deployments in terms of failed VM requests and GPU Watt-hours consumption. We showed through extensive experimentation that 5.14% and 7.90% additional failed VM requests were serviced by disaggregated GPU deployments consuming 10.92% and 3.30% lesser GPU Watt-hours compared to traditional deployment. As our second contribution, we then identify how the disaggregation constructs could be met at different abstraction levels for NVIDIA GPU computing stack. We introduce the notion of Disaggregation Plane to understand the feasibility and limitations of a disaggregated solution We then evaluate various GPU disaggregation solution approaches with Disaggregation plane using the following metrics: 1) Composability, 2) Independent existence, and 3) Backward Compatibility. Based on this analysis, we then propose EMF: a rack-level, open system for GPU Disaggregation. EMF, in addition to supporting core disaggregation constructs (i.e. independent existence and composability) also provides backward compatibility. We highlight some key design abstractions, elements, and some pressing issues in realizing the system while presenting the design of EMF. Lastly, we evaluate performance impact by quantifying worst-case latency overheads due to disaggregation. We model Host device driver and GPU interactions for data transfer operations over PCIe in terms of TLPs to understand the performance impact due to our design. Further evaluation with 6 Deep Learning applications shows that these overheads could vary from 7.6% to 20.2%, justifying the practicality of our design. We found that the latency overheads are directly correlated with the Average Throughput of the application and applications with short-lifetimes having bursty data-transfer characteristics may show visible performance degradation.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G29701
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectGPUen_US
dc.subjectQUADD-SIMen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technologyen_US
dc.titleEMF: System Design and Challenges for Disaggregated GPUs in datacenters for Efficiency, Modularity and Flexibilityen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record