Modeling and Adaptive Scheduling Strategies for Distributed Graph Algorithms

Dindokar, Ravikant Devidas

dc.contributor.advisor	Simmhan, Yogesh
dc.contributor.author	Dindokar, Ravikant Devidas
dc.date.accessioned	2020-04-30T06:33:09Z
dc.date.available	2020-04-30T06:33:09Z
dc.date.submitted	2018
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/4391
dc.description.abstract	Graph processing at scales of millions-billions of vertices and edges has become common to solve real-world problems in domains like social networks, smart cities and genomics. Distributed "Big Data" platforms for graph processing such as Pregel, GraphLab, and GoFFish provide a simpli ed component-centric programming model based on a Bulk Synchronous Parallel (BSP) model that iteratively executes at scale on commodity clusters and Clouds. Graph applications tend to be irregular and pose unique challenges for efficient execution on distributed resources. The topological variability of graphs and the asymmetric behavior of graph algorithms leads to an uneven distribution of load across Cloud Virtual Machines (VMs) holding the graph partitions. This results in poor VM utilization across different iterations, and increases monetary costs on pay-as-you-go public Cloud VMs that are billed for every coreminute acquired. Efforts to balance the execution load across different VMs at runtime also introduce overheads that increase the algorithms makespan. This gap offers the opportunity to dynamically control the VM's elasticity at each iteration such that only active partitions consume VM resources, thereby reducing the monetary cost of execution while minimizing the increase in runtime of the algorithm. We make two key contributions to achieve this. The rst is on a priori modeling of the characteristics of the distributed graph algorithm's execution on a given large graph to determine the active partitions in each iteration. For this, we propose a meta-graph as a coarse-grained analytical sketch of the entire graph to quickly estimate topological, complexity and execution properties of the large graph, including its partition activation schedule for non-stationary graph algorithms like Breadth First Search. The second contribution is a heuristic algorithm that uses the partition activation schedule to determine the fewest number of VMs required at each iteration, and the dynamic placement and movement of the partitions on to these VMs. Here, we scale-out and scale-in the VMs ondemand to ensure high utilization, while cognizant of the VM billing increment and partition movement costs. This decouples the costly partitioning, done once per graph, from the fast placement strategies done per algorithm run. Our results with real-world graphs show the heuristic to reduce VM costs by up to 75% while the makespan no more than doubles.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;G28800
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Virtual Machines	en_US
dc.subject	Graph processing	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Image analysis	en_US
dc.title	Modeling and Adaptive Scheduling Strategies for Distributed Graph Algorithms	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: G28800.pdf
Size:: 8.204Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [394]

Show simple item record