|dc.description.abstract||In this thesis, we study both theoretical and practical aspects of decision making, with a focus on reinforcement learning based methods. Reinforcement learning (RL) is a form of semi-supervised learning in which the agent learns the decision making strategy by interacting with its environment. An RL problem is modelled mathematically using the framework of Markov Decision Processes (MDPs). We develop novel reinforcement learning algorithms and study decision problems in the domains of cloud computing, crowdsourcing and predictive analytics. A brief description of the various studies conducted in this thesis is given below.
Generalized Speedy Q-learning
In the first part of the thesis, we seek to improve the finite time performance bounds of the well-known Speedy Q-learning algorithm in model-free tabular reinforcement learning. We develop an algorithm named Generalized Speedy Q-learning that integrates ideas from Speedy Q-learning and the generalized Bellman equation to derive a simple and efficient update rule. We show that the finite time bound for this algorithm is better than that of Speedy Q-learning for MDPs with a special structure. Further, we extend our algorithm to deal with large state and action spaces by proposing the use of linear function approximation. Experimental results in random MDPs as well as benchmark problems demonstrate the effectiveness of the proposed algorithms.
SOR-DQN algorithm and its application in Cloud Computing
Extending the idea in the above algorithm, we develop a novel Deep Reinforcement Learning algorithm by combining the technique of successive over-relaxation with Deep Q-networks (DQN). The new algorithm, named SOR-DQN, uses modified targets in the DQN framework with the aim of accelerating training. We study the application of SOR-DQN in the problem of auto-scaling resources for cloud applications, for which existing algorithms suffer from issues such as slow convergence, poor performance during the training phase and non-scalability. It is seen that SOR-DQN achieves lower training costs as compared to DQN for this problem. Moreover, the generalization ability of the algorithm is shown to be superior to that of DQN using a set of Atari video games.
Efficient budget allocation and task assignment in Crowdsourcing
Next, we consider an interesting research problem in the domain of crowdsourcing - that of efficiently allocating a fixed budget among a set of tasks with varying difficulty levels. Further, the assignment of tasks to workers with different skill levels is tackled. This problem is modelled in the RL framework and an approximate solution is proposed to deal with the exploding state space. The solution achieves good accuracy while minimizing the overall completion time of tasks. To the best of our knowledge, this is the first work that ties together the twin objectives of maximising accuracy and minimising completion time under a budget constraint and a dynamic worker pool in crowdsourcing.
Predictive and Prescriptive Analytics for Performance Optimization
We also study the following problem in predictive analytics: predicting the future values of system parameters well in advance for a large-scale software or industrial system, which is important for avoiding disruptions. An equally challenging and useful exercise is to identify the `important' parameters and optimize them in order to attain good system performance. We describe a generic framework, along with specific methods, for this data analytics problem and present a case study on a large-scale enterprise system. The proposed method combines techniques from machine learning, causal analysis, time-series analysis and stochastic optimization to achieve accurate prediction (estimating future values of parameters) and reliable prescription (controlling independent parameters to optimize system performance).||en_US