Show simple item record

dc.contributor.advisorSimmhan, Yogesh
dc.contributor.authorChaturvedi, Shilpa
dc.date.accessioned2021-09-16T06:59:24Z
dc.date.available2021-09-16T06:59:24Z
dc.date.submitted2018
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5299
dc.description.abstractInternet of Things (IoT) deployments comprising of sensors and actuators collect observational data and provide continuous streams of data, often called streaming data or fast data. Smart Cities use such IoT technologies for providing effective citizen services, and improve the effi- ciency of the utility infrastructure. Such Smart City applications need to analyze and process these data streams in near real-time to make decisions or provide public services. Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming are data platforms tailored to handle streaming data and enable composition of data ows that execute constantly over one or more such data streams. As such IoT applications using shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping data flows, thereby improving the resource efficiency. We propose data flow reuse algorithms that when given a submitted data flow, identify the intersection of reusable tasks and streams from existing data flows to form a merged data flow, with guaranteed equivalence of their output streams. Algorithms to unmerge data flows when they are removed, and defragment partially reused data flows are also proposed. We implement these algorithms for the Apache Storm and validate their performance and resource savings using 86 real and synthetic data flows from eScience and IoT domains. Our reuse strategies reduce the number of running tasks by 34 􀀀 45% and the cumulative CPU usage by 29 􀀀 63%. Including defragmentation of incremental data flows achieves a monetary savings on Cloud resources of 36 􀀀 44% compared to data flows without reuse, and has limited redeployment overheads. As a further extension, we also explore the use of such reuse strategies to enhance the resilience of streaming data flows in a cyber physical system that is exposed to external threats. When data flows run in a shared environment, they are even more susceptible to attacks and threats. Moving target Defense (MTD) is a mitigation strategy that introduces spatiotemporal variations into the system to obfuscate system for attackers. We explore how some of the existing MTD techniques and our reuse strategies can be adopted to provide resilience in DSPSen_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;G29307
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectInternet of Thingsen_US
dc.subjectSmart City applicationsen_US
dc.subjectDistributed Stream Processing Systemsen_US
dc.subjectMoving target Defenseen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleEfficient and Resilient Stream Processing in Distributed Shared Environmenten_US
dc.typeThesisen_US
dc.degree.nameMSen_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record