Show simple item record

dc.contributor.advisorThathachar, M A L
dc.contributor.authorLakshmi Varahan, S
dc.date.accessioned2025-12-01T09:29:34Z
dc.date.available2025-12-01T09:29:34Z
dc.date.submitted1973
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/7549
dc.description.abstractReported in this thesis is the study of nonlinear learning algorithms or reinforcement schemes for multi-state stochastic automata acting in stationary random media with unknown response characteristics. Such stochastic automata have been widely used as models for learning systems. Depending on the nature of the response characteristics of the medium, three classes of reinforcement schemes - P-model, Q-model, and S-model - are distinguished. Numerous reinforcement schemes have been reported in the literature in the past for stochastic automata acting in both discrete time and continuous time. First, it is shown by a counterexample that all the so-called optimal (P-model) schemes reported in the literature are not really optimal. A general reinforcement scheme of the reward-penalty type (under each of the three classes) for multi-state stochastic automata is considered. These schemes generate a continuous state space, discrete-parameter Markov process, which in general could be (i) ergodic or (ii) absorbing. Only absorbing model schemes are studied in this thesis. The existing definition of expediency is modified (weakened) to suit both the ergodic and absorbing model schemes. A new concept of absolute expediency (monotonic decrease of the expected value of the average penalty) is introduced. It is shown that a very general but simple condition of symmetry of the nonlinear functions figuring in the reinforcement scheme is necessary and sufficient for absolute expediency. A variety of schemes are simulated and compared on the basis of speed of convergence. It is shown through simulation that absolute expediency often leads to results which are close to optimality. While the reinforcement scheme updates the state probabilities of the stochastic automaton, a simultaneous estimation of the penalty probabilities of the medium using Bayesian techniques is proposed. This leads to a fast estimate of that state of the automaton with the minimum penalty probability and also provides a confidence level on this estimate. Absolute expediency only ensures that the scheme converges, but a fast-converging scheme might as well converge to an undesired, that is, non-optimal state with a high probability. Bounds on the probability with which a given absorbing model scheme converges to the (un)desired state are derived for both two-state and multi-state stochastic automata. Further, it is shown that there is a trade-off between the speed of convergence and the bounds on the probability of converging to the (un)desired state. A consequence of these results is that the reinforcement schemes should be designed by considering both the speed of convergence and the bound on the probability of converging to the desired state. It is shown that for nearly the same speed of convergence, nonlinear schemes have better bounds on the probability of converging to the desired state than the corresponding linear schemes. It is in the design of such nonlinear schemes that the concept of absolute expediency becomes very useful. Finally, an application of the learning algorithms to the solution of the hypothesis testing problem is presented. Given the upper bounds on the error probabilities of the two kinds, a design procedure for devising an algorithm for stochastic automata which ensures a proper decision is developed. The method is illustrated by an application to a simple detection problem of a known constant signal in additive Gaussian noise
dc.language.isoen_US
dc.relation.ispartofseriesT01019
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subjectReinforcement learning
dc.subjectStochastic automata
dc.titleLearning Algorithms for stochastic automata-
dc.typeBayesian estimation
dc.degree.namePhD
dc.degree.levelDoctoral
dc.degree.grantorIndian Institute of Science
dc.degree.disciplineEngineering


Files in this item

This item appears in the following Collection(s)

Show simple item record