Learning Algorithms for stochastic automata-

Lakshmi Varahan, S

dc.contributor.advisor	Thathachar, M A L
dc.contributor.author	Lakshmi Varahan, S
dc.date.accessioned	2025-12-01T09:29:34Z
dc.date.available	2025-12-01T09:29:34Z
dc.date.submitted	1973
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7549
dc.description.abstract	Reported in this thesis is the study of nonlinear learning algorithms or reinforcement schemes for multi-state stochastic automata acting in stationary random media with unknown response characteristics. Such stochastic automata have been widely used as models for learning systems. Depending on the nature of the response characteristics of the medium, three classes of reinforcement schemes - P-model, Q-model, and S-model - are distinguished. Numerous reinforcement schemes have been reported in the literature in the past for stochastic automata acting in both discrete time and continuous time. First, it is shown by a counterexample that all the so-called optimal (P-model) schemes reported in the literature are not really optimal. A general reinforcement scheme of the reward-penalty type (under each of the three classes) for multi-state stochastic automata is considered. These schemes generate a continuous state space, discrete-parameter Markov process, which in general could be (i) ergodic or (ii) absorbing. Only absorbing model schemes are studied in this thesis. The existing definition of expediency is modified (weakened) to suit both the ergodic and absorbing model schemes. A new concept of absolute expediency (monotonic decrease of the expected value of the average penalty) is introduced. It is shown that a very general but simple condition of symmetry of the nonlinear functions figuring in the reinforcement scheme is necessary and sufficient for absolute expediency. A variety of schemes are simulated and compared on the basis of speed of convergence. It is shown through simulation that absolute expediency often leads to results which are close to optimality. While the reinforcement scheme updates the state probabilities of the stochastic automaton, a simultaneous estimation of the penalty probabilities of the medium using Bayesian techniques is proposed. This leads to a fast estimate of that state of the automaton with the minimum penalty probability and also provides a confidence level on this estimate. Absolute expediency only ensures that the scheme converges, but a fast-converging scheme might as well converge to an undesired, that is, non-optimal state with a high probability. Bounds on the probability with which a given absorbing model scheme converges to the (un)desired state are derived for both two-state and multi-state stochastic automata. Further, it is shown that there is a trade-off between the speed of convergence and the bounds on the probability of converging to the (un)desired state. A consequence of these results is that the reinforcement schemes should be designed by considering both the speed of convergence and the bound on the probability of converging to the desired state. It is shown that for nearly the same speed of convergence, nonlinear schemes have better bounds on the probability of converging to the desired state than the corresponding linear schemes. It is in the design of such nonlinear schemes that the concept of absolute expediency becomes very useful. Finally, an application of the learning algorithms to the solution of the hypothesis testing problem is presented. Given the upper bounds on the error probabilities of the two kinds, a design procedure for devising an algorithm for stochastic automata which ensures a proper decision is developed. The method is illustrated by an application to a simple detection problem of a known constant signal in additive Gaussian noise
dc.language.iso	en_US
dc.relation.ispartofseries	T01019
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Reinforcement learning
dc.subject	Stochastic automata
dc.title	Learning Algorithms for stochastic automata-
dc.type	Bayesian estimation
dc.degree.name	PhD
dc.degree.level	Doctoral
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T01019.pdf
Size:: 76.92Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Electrical Engineering (EE) [423]

Show simple item record