• Login
    View Item 
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    •   etd@IISc
    • Division of Electrical, Electronics, and Computer Science (EECS)
    • Computer Science and Automation (CSA)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    DIASPORA : A fully distributed web-query processing system

    Thumbnail
    View/Open
    T04749.pdf (32.16Mb)
    Author
    Ramanath, Maya
    Metadata
    Show full item record
    Abstract
    The Web, with its vast, heterogeneous, and dynamic content, poses significant challenges for applying classical database technologies. The lack of structure, the presence of hyperlinks, and the absence of centralized control make traditional data modeling, querying, and processing approaches inadequate. This thesis presents DIASPORA, a web database system designed to address these challenges through an integrated solution encompassing data modeling, query language design, and distributed query processing. DIASPORA introduces a graph-based data model that captures both the content and hyperlink structure of web documents. It supports traditional formats like HTML and emerging semantic formats like XML. The model automatically infers semantic relationships using markup tags and element values, enabling fully automatic graph construction. A declarative query language allows users to specify keyword-based hints and hyperlink predicates, facilitating both content-based and structure-aware querying. DIASPORA’s most novel feature is its fully distributed query processing mechanism, which contrasts with conventional centralized approaches. Queries are shipped across web nodes, processed locally, and results are returned without requiring coordination from a master site. The system addresses key challenges in distributed query processing, including query completion, rewriting, termination, and result transmission. A Java-based prototype has been implemented and tested on IISc campus websites, demonstrating significant improvements in query quality and resource efficiency. DIASPORA is positioned to support a wide range of web applications, including search engine indexing, site mapping, and fine-grained querying of XML documents. Its distributed architecture also opens avenues for mining user queries to enhance public and commercial web services.
    URI
    https://etd.iisc.ac.in/handle/2005/7146
    Collections
    • Computer Science and Automation (CSA) [442]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV