• Login
    View Item 
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Benchmarking and Scheduling Strategies for Distributed Stream Processing

    View/Open
    G28612.pdf (5.502Mb)
    Date
    2018-08-20
    Author
    Shukla, Anshu
    Metadata
    Show full item record
    Abstract
    The velocity dimension of Big Data refers to the need to rapidly process data that arrives continuously as streams of messages or events. Distributed Stream Processing Systems (DSPS) refer to distributed programming and runtime platforms that allow users to define a composition of dataflow logic that are executed on distributed resources over streams of incoming messages. A DSPS uses commodity clusters and Cloud Virtual Machines (VMs) for its execution. In order to meet the required performance for these applications, the DSPS needs to schedule these dataßows efficiently over the resources. Despite their growing use, resource scheduling for DSPSÕs tends to be done in an ad hoc manner, favoring empirical and reactive approaches, rather than a model-driven and analytical approach. Such empirical strategies may arrive at an approximate schedule for the dataflow that needs further tuning to meet the quality of service. We propose a model-based scheduling approach that makes use of performance profiles and benchmarks developed for tasks in the dataßow to plan both the resource allocation and the resource mapping that together form the schedule planning process. We propose the Model Based Allocation (MBA) and the Slot Aware Mapping (SAM) approaches that efectively utilize knowledge of the performance model of logic tasks to provide an efficient and predictable scheduling behavior. We implemented and validate these algorithms using the popular open source Apache Storm DSPS for several micro and application dataflows. The results show that our model-driven approach is able to reduce the amount of required resources (VMs) by 30% − 50% relative to existing techniques. Also we see that our strategies o↵er a predictable behavior that ensures that the expected and actual rates supported and resources used match closely. This can enable deterministic schedule planning even under dynamic conditions. Besides this static scheduling, we also examine the ability to dynamically consolidate tasks onto fewer VMs when the load on the dataßow decreases or the VMs get fragmented. We propose reliable task migration models for Apache Storm dataßows that are able to rapidly move the task assignment in the cluster, and resume the dataflow execution without any message loss.
    URI
    https://etd.iisc.ac.in/handle/2005/3984
    Collections
    • Department of Computational and Data Sciences (CDS) [102]

    Related items

    Showing items related by title, author, creator and subject.

    • Efficient Compilation Of Stream Programs Onto Multi-cores With Accelerators 

      Udupa, Abhishek (2010-12-30)
      Over the past two decades, microprocessor manufacturers have typically relied on wider issue widths and deeper pipelines to obtain performance improvements for single threaded applications. However, in the recent years, ...
    • Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification 

      Ranganath, B N (2010-08-24)
      Data mining is an area to find valid, novel, potentially useful, and ultimately understandable abstractions in a data. Frequent itemset mining is one of the important data mining approaches to find those abstractions in ...
    • New Approaches And Experimental Studies On - Alegebraic Attacks On Stream Ciphers 

      Pillai, N Rajesh (2015-02-05)
      Algebraic attacks constitute an effective class of cryptanalytic attacks which have come up recently. In algebraic attacks, the relations between the input, output and the key are expressed as a system of equations and ...

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV