Balancing Money and Time for OLAP Queries on Cloud Databases

Sabih, Rafia

dc.contributor.advisor	Haritsa, Jayant R
dc.contributor.author	Sabih, Rafia
dc.date.accessioned	2017-12-16T08:28:04Z
dc.date.accessioned	2018-07-31T06:40:22Z
dc.date.available	2017-12-16T08:28:04Z
dc.date.available	2018-07-31T06:40:22Z
dc.date.issued	2017-12-16
dc.date.submitted	2016
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/2931
dc.identifier.abstract	https://etd.iisc.ac.in/static/etd/abstracts/3793/G27789-Abs.pdf	en_US
dc.description.abstract	Enterprise Database Management Systems (DBMSs) have to contend with resource-intensive and time-varying workloads, making them well-suited candidates for migration to cloud plat-forms { specifically, they can dynamically leverage the resource elasticity while retaining affordability through the pay-as-you-go rental interface. The current design of database engine components lays emphasis on maximizing computing efficiency, but to fully capitalize on the cloud's benefits, the outlays of these computations also need to be factored into the planning exercise. In this thesis, we investigate this contemporary problem in the context of industrial-strength deployments of relational database systems on real-world cloud platforms. Specifically, we consider how the traditional metric used to compare query execution plans, namely response-time, can be augmented to incorporate monetary costs in the decision process. The challenge here is that execution-time and monetary costs are adversarial metrics, with a decrease in one entailing a rise in the other. For instance, a Virtual Machine (VM) with rich physical resources (RAM, cores, etc.) decreases the query response-time, but is expensive with regard to rental rates. In a nutshell, there is a tradeoff between money and time, and our goal therefore is to identify the VM that others the best tradeoff between these two competing considerations. In our study, we pro le the behavior of money versus time for a given query, and de ne the best tradeoff as the \knee" { that is, the location on the pro le with the minimum Euclidean distance from the origin. To study the performance of industrial-strength database engines on real-world cloud infrastructure, we have deployed a commercial DBMS on Google cloud services. On this platform, we have carried out extensive experimentation with the TPC-DS decision-support benchmark, an industry-wide standard for evaluating database system performance. Our experiments demonstrate that the choice of VM for hosting the database server is a crucial decision, because: (i) variation in time and money across VMs is significant for a given query, (ii) no one VM offers the best money-time tradeoff across all queries. To efficiently identify the VM with the best tradeoff from a large suite of available configurations, we propose a technique to characterize the money-time pro le for a given query. The core of this technique is a VM pruning mechanism that exploits the property of partially ordered set of the VMs on their resources. It processes the minimal and maximal VMs of this poset for estimated query response-time. If the response-times on these extreme VMs are similar, then all the VMs sandwiched between them are pruned from further consideration. Otherwise, the already processed VMs are set aside, and the minimal and maximal VMs of the remaining unprocessed VMs are evaluated for their response-times. Finally, the knee VM is identified from the processed VMs as the one with the minimum Euclidean distance from the origin on the money-time space. We theoretically prove that this technique always identifies the knee VM; further, if it is acceptable to and a \near-optimal" knee by providing a relaxation-factor on the response-time distance from the optimal knee, then it is also capable of finding more efficiently a satisfactory knee under these relaxed conditions. We propose two favors of this approach: the first one prunes the VMs using complete plan information received from database engine API, and named as Plan-based Identification of Knee (PIK). On the other hand, to further increase the efficiency of the identification of the knee VM, we propose a sub-plan based pruning algorithm called Sub-Plan-based Identification of Knee (SPIK), which requires modifications in the query optimizer. We have evaluated PIK on a commercial system and found that it often requires processing for only 20% of the total VMs. The efficiency of the algorithm is further increased significantly, by using 10-20% relaxation in response-time. For evaluating SPIK , we prototyped it on an open-source engine { Postgresql 9.3, and also implemented it as Java wrapper program with the commercial engine. Experimentally, the processing done by SPIK is found to be only 40% of the PIK approach. Therefore, from an overall perspective, this thesis facilitates the desired migration of enterprise databases to cloud platforms, by identifying the VM(s) that offer competitive tradeoffs between money and time for the given query.	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	G27789	en_US
dc.subject	Database Management Syatem (DBMS)	en_US
dc.subject	Virtual Machine	en_US
dc.subject	Google Cloud Services	en_US
dc.subject	Cloud Platforms	en_US
dc.subject	Cloud Databases	en_US
dc.subject	Cloud Query Processing Model	en_US
dc.subject	Multi-objective Query Optimization (MOQO)	en_US
dc.subject	Plan-based Identification of Knee (PIK )	en_US
dc.subject	Knee VM	en_US
dc.subject	Sub-Plan-based Identification of Knee (SPIK)	en_US
dc.subject.classification	Computational and Data Sciences	en_US
dc.title	Balancing Money and Time for OLAP Queries on Cloud Databases	en_US
dc.type	Thesis	en_US
dc.degree.name	MSc Engg	en_US
dc.degree.level	Masters	en_US
dc.degree.discipline	Faculty of Engineering	en_US