Benchmarking and Scheduling Strategies for Distributed Stream Processing

Shukla, Anshu

dc.contributor.advisor	Simmhan, Yogesh
dc.contributor.author	Shukla, Anshu
dc.date.accessioned	2018-08-20T14:04:08Z
dc.date.accessioned	2018-08-28T09:48:11Z
dc.date.available	2018-08-20T14:04:08Z
dc.date.available	2018-08-28T09:48:11Z
dc.date.issued	2018-08-20
dc.date.submitted	2017
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/3984
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/4872/G28612-Abs.pdf	en_US
dc.description.abstract	The velocity dimension of Big Data refers to the need to rapidly process data that arrives continuously as streams of messages or events. Distributed Stream Processing Systems (DSPS) refer to distributed programming and runtime platforms that allow users to define a composition of dataflow logic that are executed on distributed resources over streams of incoming messages. A DSPS uses commodity clusters and Cloud Virtual Machines (VMs) for its execution. In order to meet the required performance for these applications, the DSPS needs to schedule these dataßows eﬃciently over the resources. Despite their growing use, resource scheduling for DSPSÕs tends to be done in an ad hoc manner, favoring empirical and reactive approaches, rather than a model-driven and analytical approach. Such empirical strategies may arrive at an approximate schedule for the dataflow that needs further tuning to meet the quality of service. We propose a model-based scheduling approach that makes use of performance profiles and benchmarks developed for tasks in the dataßow to plan both the resource allocation and the resource mapping that together form the schedule planning process. We propose the Model Based Allocation (MBA) and the Slot Aware Mapping (SAM) approaches that efectively utilize knowledge of the performance model of logic tasks to provide an eﬃcient and predictable scheduling behavior. We implemented and validate these algorithms using the popular open source Apache Storm DSPS for several micro and application dataflows. The results show that our model-driven approach is able to reduce the amount of required resources (VMs) by 30% − 50% relative to existing techniques. Also we see that our strategies o↵er a predictable behavior that ensures that the expected and actual rates supported and resources used match closely. This can enable deterministic schedule planning even under dynamic conditions. Besides this static scheduling, we also examine the ability to dynamically consolidate tasks onto fewer VMs when the load on the dataßow decreases or the VMs get fragmented. We propose reliable task migration models for Apache Storm dataßows that are able to rapidly move the task assignment in the cluster, and resume the dataflow execution without any message loss.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G28612	en_US
dc.subject	Distributed Stream Processing	en_US
dc.subject	Distributed Programming	en_US
dc.subject	Apache Storm Dataflows	en_US
dc.subject	Stream Processing Benchmark	en_US
dc.subject	Distributed Stream Processing Systems (DSPS)	en_US
dc.subject	IoT Applications	en_US
dc.subject	Streaming Dataflows	en_US
dc.subject	Cloud Virtual Machines (VMs)	en_US
dc.subject	Model Based Allocation (MBA)	en_US
dc.subject	Slot Aware Mapping (SAM)	en_US
dc.subject.classification	Computer Science	en_US
dc.title	Benchmarking and Scheduling Strategies for Distributed Stream Processing	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US

Files in this item

Name:: G28612.pdf
Size:: 5.502Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [118]

Show simple item record

Benchmarking and Scheduling Strategies for Distributed Stream Processing

Files in this item

This item appears in the following Collection(s)

Related items

Efficient Compilation Of Stream Programs Onto Multi-cores With Accelerators ﻿

Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification ﻿

New Approaches And Experimental Studies On - Alegebraic Attacks On Stream Ciphers ﻿

Efficient Compilation Of Stream Programs Onto Multi-cores With Accelerators

Efficient Frequent Closed Itemset Algorithms With Applications To Stream Mining And Classification

New Approaches And Experimental Studies On - Alegebraic Attacks On Stream Ciphers