• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Modeling and Adaptive Scheduling Strategies for Distributed Graph Algorithms

    View/Open
    Thesis full text (8.204Mb)
    Author
    Dindokar, Ravikant Devidas
    Metadata
    Show full item record
    Abstract
    Graph processing at scales of millions-billions of vertices and edges has become common to solve real-world problems in domains like social networks, smart cities and genomics. Distributed "Big Data" platforms for graph processing such as Pregel, GraphLab, and GoFFish provide a simpli ed component-centric programming model based on a Bulk Synchronous Parallel (BSP) model that iteratively executes at scale on commodity clusters and Clouds. Graph applications tend to be irregular and pose unique challenges for efficient execution on distributed resources. The topological variability of graphs and the asymmetric behavior of graph algorithms leads to an uneven distribution of load across Cloud Virtual Machines (VMs) holding the graph partitions. This results in poor VM utilization across different iterations, and increases monetary costs on pay-as-you-go public Cloud VMs that are billed for every coreminute acquired. Efforts to balance the execution load across different VMs at runtime also introduce overheads that increase the algorithms makespan. This gap offers the opportunity to dynamically control the VM's elasticity at each iteration such that only active partitions consume VM resources, thereby reducing the monetary cost of execution while minimizing the increase in runtime of the algorithm. We make two key contributions to achieve this. The rst is on a priori modeling of the characteristics of the distributed graph algorithm's execution on a given large graph to determine the active partitions in each iteration. For this, we propose a meta-graph as a coarse-grained analytical sketch of the entire graph to quickly estimate topological, complexity and execution properties of the large graph, including its partition activation schedule for non-stationary graph algorithms like Breadth First Search. The second contribution is a heuristic algorithm that uses the partition activation schedule to determine the fewest number of VMs required at each iteration, and the dynamic placement and movement of the partitions on to these VMs. Here, we scale-out and scale-in the VMs ondemand to ensure high utilization, while cognizant of the VM billing increment and partition movement costs. This decouples the costly partitioning, done once per graph, from the fast placement strategies done per algorithm run. Our results with real-world graphs show the heuristic to reduce VM costs by up to 75% while the makespan no more than doubles.
    URI
    https://etd.iisc.ac.in/handle/2005/4391
    Collections
    • Computer Science and Automation (CSA) [392]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV