Show simple item record

dc.contributor.advisorSimmhan, Yogesh
dc.contributor.authorDindokar, Ravikant Devidas
dc.date.accessioned2020-04-30T06:33:09Z
dc.date.available2020-04-30T06:33:09Z
dc.date.submitted2018
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/4391
dc.description.abstractGraph processing at scales of millions-billions of vertices and edges has become common to solve real-world problems in domains like social networks, smart cities and genomics. Distributed "Big Data" platforms for graph processing such as Pregel, GraphLab, and GoFFish provide a simpli ed component-centric programming model based on a Bulk Synchronous Parallel (BSP) model that iteratively executes at scale on commodity clusters and Clouds. Graph applications tend to be irregular and pose unique challenges for efficient execution on distributed resources. The topological variability of graphs and the asymmetric behavior of graph algorithms leads to an uneven distribution of load across Cloud Virtual Machines (VMs) holding the graph partitions. This results in poor VM utilization across different iterations, and increases monetary costs on pay-as-you-go public Cloud VMs that are billed for every coreminute acquired. Efforts to balance the execution load across different VMs at runtime also introduce overheads that increase the algorithms makespan. This gap offers the opportunity to dynamically control the VM's elasticity at each iteration such that only active partitions consume VM resources, thereby reducing the monetary cost of execution while minimizing the increase in runtime of the algorithm. We make two key contributions to achieve this. The rst is on a priori modeling of the characteristics of the distributed graph algorithm's execution on a given large graph to determine the active partitions in each iteration. For this, we propose a meta-graph as a coarse-grained analytical sketch of the entire graph to quickly estimate topological, complexity and execution properties of the large graph, including its partition activation schedule for non-stationary graph algorithms like Breadth First Search. The second contribution is a heuristic algorithm that uses the partition activation schedule to determine the fewest number of VMs required at each iteration, and the dynamic placement and movement of the partitions on to these VMs. Here, we scale-out and scale-in the VMs ondemand to ensure high utilization, while cognizant of the VM billing increment and partition movement costs. This decouples the costly partitioning, done once per graph, from the fast placement strategies done per algorithm run. Our results with real-world graphs show the heuristic to reduce VM costs by up to 75% while the makespan no more than doubles.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G28800
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectVirtual Machinesen_US
dc.subjectGraph processingen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Image analysisen_US
dc.titleModeling and Adaptive Scheduling Strategies for Distributed Graph Algorithmsen_US
dc.typeThesisen_US
dc.degree.nameMSc Enggen_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record