dc.contributor.advisor | Simmhan, Yogesh | |
dc.contributor.author | Chaturvedi, Shilpa | |
dc.date.accessioned | 2021-09-16T06:59:24Z | |
dc.date.available | 2021-09-16T06:59:24Z | |
dc.date.submitted | 2018 | |
dc.identifier.uri | https://etd.iisc.ac.in/handle/2005/5299 | |
dc.description.abstract | Internet of Things (IoT) deployments comprising of sensors and actuators collect observational
data and provide continuous streams of data, often called streaming data or fast data. Smart
Cities use such IoT technologies for providing effective citizen services, and improve the effi-
ciency of the utility infrastructure. Such Smart City applications need to analyze and process
these data streams in near real-time to make decisions or provide public services. Distributed
Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming are data
platforms tailored to handle streaming data and enable composition of data
ows that execute
constantly over one or more such data streams.
As such IoT applications using shared urban sensor streams continue to grow, applications
will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively
reuse the outputs of overlapping data
flows, thereby improving the resource efficiency.
We propose data
flow reuse algorithms that when given a submitted data
flow, identify the intersection
of reusable tasks and streams from existing data
flows to form a merged data
flow, with
guaranteed equivalence of their output streams. Algorithms to unmerge data
flows when they
are removed, and defragment partially reused data
flows are also proposed. We implement these
algorithms for the Apache Storm and validate their performance and resource savings using 86
real and synthetic data
flows from eScience and IoT domains. Our reuse strategies reduce the
number of running tasks by 34 45% and the cumulative CPU usage by 29 63%. Including
defragmentation of incremental data
flows achieves a monetary savings on Cloud resources of
36 44% compared to data
flows without reuse, and has limited redeployment overheads.
As a further extension, we also explore the use of such reuse strategies to enhance the
resilience of streaming data
flows in a cyber physical system that is exposed to external threats.
When data
flows run in a shared environment, they are even more susceptible to attacks and
threats. Moving target Defense (MTD) is a mitigation strategy that introduces spatiotemporal
variations into the system to obfuscate system for attackers. We explore how some of the existing
MTD techniques and our reuse strategies can be adopted to provide resilience in DSPS | en_US |
dc.language.iso | en_US | en_US |
dc.relation.ispartofseries | ;G29307 | |
dc.rights | I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part
of this thesis or dissertation | en_US |
dc.subject | Internet of Things | en_US |
dc.subject | Smart City applications | en_US |
dc.subject | Distributed Stream Processing Systems | en_US |
dc.subject | Moving target Defense | en_US |
dc.subject.classification | Research Subject Categories::TECHNOLOGY::Information technology::Computer science | en_US |
dc.title | Efficient and Resilient Stream Processing in Distributed Shared Environment | en_US |
dc.type | Thesis | en_US |
dc.degree.name | MS | en_US |
dc.degree.level | Masters | en_US |
dc.degree.grantor | Indian Institute of Science | en_US |
dc.degree.discipline | Engineering | en_US |