DIASPORA : A fully distributed web-query processing system

Ramanath, Maya

dc.contributor.advisor	Haritsa, Jayant R
dc.contributor.author	Ramanath, Maya
dc.date.accessioned	2025-10-07T10:51:53Z
dc.date.available	2025-10-07T10:51:53Z
dc.date.submitted	2000
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7146
dc.description.abstract	The Web, with its vast, heterogeneous, and dynamic content, poses significant challenges for applying classical database technologies. The lack of structure, the presence of hyperlinks, and the absence of centralized control make traditional data modeling, querying, and processing approaches inadequate. This thesis presents DIASPORA, a web database system designed to address these challenges through an integrated solution encompassing data modeling, query language design, and distributed query processing. DIASPORA introduces a graph-based data model that captures both the content and hyperlink structure of web documents. It supports traditional formats like HTML and emerging semantic formats like XML. The model automatically infers semantic relationships using markup tags and element values, enabling fully automatic graph construction. A declarative query language allows users to specify keyword-based hints and hyperlink predicates, facilitating both content-based and structure-aware querying. DIASPORA’s most novel feature is its fully distributed query processing mechanism, which contrasts with conventional centralized approaches. Queries are shipped across web nodes, processed locally, and results are returned without requiring coordination from a master site. The system addresses key challenges in distributed query processing, including query completion, rewriting, termination, and result transmission. A Java-based prototype has been implemented and tested on IISc campus websites, demonstrating significant improvements in query quality and resource efficiency. DIASPORA is positioned to support a wide range of web applications, including search engine indexing, site mapping, and fine-grained querying of XML documents. Its distributed architecture also opens avenues for mining user queries to enhance public and commercial web services.
dc.language.iso	en_US
dc.relation.ispartofseries	T04749
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Distributed Query Processing
dc.subject	Hyperlink Structure
dc.subject	Declarative Query Language
dc.title	DIASPORA : A fully distributed web-query processing system
dc.type	Thesis
dc.degree.name	MSc Engg
dc.degree.level	Masters
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T04749.pdf
Size:: 32.16Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Supercomputer Education and Research Centre (SERC) [113]

Show simple item record