dc.contributor.advisor | Kolathaya, Shishir N Y | |
dc.contributor.advisor | Bhatnagar, Shalabh | |
dc.contributor.author | Saxena, Naman | |
dc.date.accessioned | 2023-08-02T07:11:34Z | |
dc.date.available | 2023-08-02T07:11:34Z | |
dc.date.submitted | 2023 | |
dc.identifier.uri | https://etd.iisc.ac.in/handle/2005/6175 | |
dc.description.abstract | The average reward criterion is relatively less studied as most existing works in the Reinforcement Learning literature consider the discounted reward criterion. There are few recent works that present on-policy average reward actor-critic algorithms, but average reward off-policy actor-critic is relatively less explored. In this work, we present both on-policy and off-policy deterministic policy gradient theorems for the average reward performance criterion. Using these theorems, we also present an Average Reward Off-Policy Deep Deterministic Policy Gradient (ARO-DDPG) Algorithm. We first show asymptotic convergence analysis using the ODE-based method. Subsequently, we provide a finite time analysis of the resulting stochastic approximation scheme with linear function approximator and obtain an $\epsilon$-optimal stationary policy with a sample complexity of $\Omega(\epsilon^{-2.5})$. We compare the average reward performance of our proposed ARO-DDPG algorithm and observe better empirical performance compared to state-of-the-art on-policy average reward actor-critic algorithms over MuJoCo-based environments. | en_US |
dc.language.iso | en_US | en_US |
dc.relation.ispartofseries | ;ET00188 | |
dc.rights | I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part
of this thesis or dissertation | en_US |
dc.subject | Reinforcement Learning | en_US |
dc.subject | Actor-Critic Algorithm | en_US |
dc.subject | Stochastic Approximation | en_US |
dc.subject.classification | Research Subject Categories::TECHNOLOGY::Information technology::Computer science::Computer science | en_US |
dc.title | Average Reward Actor-Critic with Deterministic Policy Search | en_US |
dc.type | Thesis | en_US |
dc.degree.name | MTech (Res) | en_US |
dc.degree.level | Masters | en_US |
dc.degree.grantor | Indian Institute of Science | en_US |
dc.degree.discipline | Engineering | en_US |