• Login
    View Item 
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    •   etd@IISc
    • Division of Interdisciplinary Research
    • Department of Computational and Data Sciences (CDS)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Efficient and Resilient Stream Processing in Distributed Shared Environment

    View/Open
    Thesis full text (3.054Mb)
    Author
    Chaturvedi, Shilpa
    Metadata
    Show full item record
    Abstract
    Internet of Things (IoT) deployments comprising of sensors and actuators collect observational data and provide continuous streams of data, often called streaming data or fast data. Smart Cities use such IoT technologies for providing effective citizen services, and improve the effi- ciency of the utility infrastructure. Such Smart City applications need to analyze and process these data streams in near real-time to make decisions or provide public services. Distributed Stream Processing Systems (DSPS) like Apache Storm and Spark Streaming are data platforms tailored to handle streaming data and enable composition of data ows that execute constantly over one or more such data streams. As such IoT applications using shared urban sensor streams continue to grow, applications will perform duplicate pre-processing and analytics tasks. This offers the opportunity to collaboratively reuse the outputs of overlapping data flows, thereby improving the resource efficiency. We propose data flow reuse algorithms that when given a submitted data flow, identify the intersection of reusable tasks and streams from existing data flows to form a merged data flow, with guaranteed equivalence of their output streams. Algorithms to unmerge data flows when they are removed, and defragment partially reused data flows are also proposed. We implement these algorithms for the Apache Storm and validate their performance and resource savings using 86 real and synthetic data flows from eScience and IoT domains. Our reuse strategies reduce the number of running tasks by 34 􀀀 45% and the cumulative CPU usage by 29 􀀀 63%. Including defragmentation of incremental data flows achieves a monetary savings on Cloud resources of 36 􀀀 44% compared to data flows without reuse, and has limited redeployment overheads. As a further extension, we also explore the use of such reuse strategies to enhance the resilience of streaming data flows in a cyber physical system that is exposed to external threats. When data flows run in a shared environment, they are even more susceptible to attacks and threats. Moving target Defense (MTD) is a mitigation strategy that introduces spatiotemporal variations into the system to obfuscate system for attackers. We explore how some of the existing MTD techniques and our reuse strategies can be adopted to provide resilience in DSPS
    URI
    https://etd.iisc.ac.in/handle/2005/5299
    Collections
    • Department of Computational and Data Sciences (CDS) [102]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV