|dc.description.abstract||Modern database systems use a query optimizer to identify the most efficient strategy, called "plan", to execute declarative SQL queries. Optimization is a mandatory exercise since the difference between the cost of best plan and a random choice could be in orders of magnitude. The role of query optimization is especially critical for the decision support queries featured in data warehousing and data mining applications.
For a query on a given database and system configuration, the optimizer's plan choice is primarily a function of the selectivities of the base relations participating in the query. A pictorial enumeration of the execution plan choices of a database query optimizer over this relational selectivity space is called a "plan diagram". It has been shown recently that these diagrams are often remarkably complex and dense, with a large number of plans covering the space. An interesting research problem that immediately arises is whether complex plan diagrams can be reduced to a significantly smaller number of plans, without materially compromising the query processing quality. The motivation is that reduced plan diagrams provide several benefits, including quantifying the redundancy in the plan search space, enhancing the applicability of parametric query optimization, identifying error-resistant and least-expected-cost plans, and minimizing the overhead of multi-plan approaches.
In this thesis, we investigate the plan diagram reduction issue from theoretical, statistical and empirical perspectives. Our analysis shows that optimal plan diagram reduction, w.r.t. minimizing the number of plans in the reduced diagram, is an NP-hard problem, and remains so even for a storage-constrained variation. We then present CostGreedy, a greedy reduction algorithm that has tight and optimal performance guarantees, and whose complexity scales linearly with the number of plans in the diagram. Next, we construct an extremely fast estimator, AmmEst, for identifying the location of the best tradeoff between the reduction in plan cardinality and the impact on query processing quality. Both CostGreedy and AmmEst have been incorporated in the publicly-available Picasso optimizer visualization tool.
Through extensive experimentation with benchmark query templates on industrial-strength database optimizers, we demonstrate that with only a marginal increase in query processing costs, CostGreedy reduces even complex plan diagrams running to hundreds of plans to "anorexic" levels (small absolute number of plans). While these results are produced using a highly conservative upper-bounding of plan costs based on a cost monotonicity constraint, when the costing is done on "actuals" using remote plan costing, the reduction obtained is even greater - in fact, often resulting in a single plan in the reduced diagram. We also highlight how anorexic reduction provides enhanced resistance to selectivity estimate errors, a long-standing bane of good plan selection.
In summary, this thesis demonstrates that complex plan diagrams can be efficiently converted to anorexic reduced diagrams, a result with useful implications for the design and use of next-generation database query optimizers.||en