Show simple item record

dc.contributor.advisorHaritsa, Jayant
dc.contributor.authorShah, Dhrumilkumar
dc.date.accessioned2021-06-16T10:24:35Z
dc.date.available2021-06-16T10:24:35Z
dc.date.submitted2021
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/5163
dc.description.abstractThe Relational Database Management Systems (RDBMS) constitute the backbone of today's information-rich society, providing a congenial environment for handling enterprise data during its entire life cycle of generation, storage, maintenance, and processing. The Structured Query Language (SQL) is the standard interface to query the information present in RDBMS-based storage. Knowing the expected size of the SQL query result, measured in terms of the output row-cardinality, prior to execution can benefit both the RDBMS system and the user in several ways. The use-cases include assessing query feasibility, approximate query answering, query progress monitoring, and resource allocation strategies. In the context of our work, we define cardinality estimation as the estimation of the result size (number of rows in the output) of the given SQL query. Unfortunately, the histogram and sampling-based techniques commonly used in industrial database engines for cardinality estimation are often woefully inaccurate in practice. This lacuna has motivated a recent slew of papers advocating the use of machine-learning/deep-learning techniques for cardinality estimation. However, these new approaches have their own limitations regarding significant training effort, inability to handle dynamic data-updates, and generalization to unseen queries. In this work, we take a relook at the traditional random sampling and investigate whether they can be made to work satisfactorily when augmented with lightweight data structures. Specifically, we present GridSam – a Grid-based Dynamic Sampling technique, which essentially augments random sampling with histograms, incorporating both algorithmic and platform innovations. From the algorithmic perspective, GridSam first creates a multi-dimensional grid overlay by partitioning the data-space on “critical” attributes, and then performs dynamic sampling from the confined query-specific region of the grid to capture correlations. A greedy methodology targeted towards reducing the Zero Sample Problem occurrence is used to determine the set of “critical” attributes as the grid dimensions. Further, insights from the Index-based Join Sampling (IBJS) technique are leveraged to direct the sampling in multi-table queries. Finally, learned-indexes are incorporated to reduce the index-probing cost for join sampling during the estimation process. From the platform perspective, GridSam leverages the massive parallelism offered by current GPU architectures to provide fast grid setup times. This parallelism is also extended to the run-time estimation process. A detailed performance study on benchmark environments indicates that GridSam computes cardinality estimates with accuracies competitive to contemporary learning-based techniques. Moreover, it does so while achieving an orders-of-magnitude reduction in setup time. Further, the estimation time is in the same ballpark as both traditional and learning-based techniques. Finally, a collateral benefit of GridSam's simple and highly parallelizable design is that, unlike learned estimators, it is amenable to dynamic data environments with frequent data-updates.en_US
dc.language.isoen_USen_US
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectCardinality Estimationen_US
dc.subjectRelational Database Management Systemen_US
dc.subjectCardinality Estimation in Database Systemsen_US
dc.subject.classificationResearch Subject Categories::TECHNOLOGY::Information technology::Computer scienceen_US
dc.titleRevisiting Statistical Techniques for Result Cardinality Estimationen_US
dc.typeThesisen_US
dc.degree.nameMTech (Res)en_US
dc.degree.levelMastersen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

This item appears in the following Collection(s)

Show simple item record