Learning Algorithms for stochastic automata-
Abstract
Reported in this thesis is the study of nonlinear learning algorithms or reinforcement schemes for multi-state stochastic automata acting in stationary random media with unknown response characteristics. Such stochastic automata have been widely used as models for learning systems.
Depending on the nature of the response characteristics of the medium, three classes of reinforcement schemes - P-model, Q-model, and S-model - are distinguished. Numerous reinforcement schemes have been reported in the literature in the past for stochastic automata acting in both discrete time and continuous time. First, it is shown by a counterexample that all the so-called optimal (P-model) schemes reported in the literature are not really optimal.
A general reinforcement scheme of the reward-penalty type (under each of the three classes) for multi-state stochastic automata is considered. These schemes generate a continuous state space, discrete-parameter Markov process, which in general could be (i) ergodic or (ii) absorbing. Only absorbing model schemes are studied in this thesis.
The existing definition of expediency is modified (weakened) to suit both the ergodic and absorbing model schemes. A new concept of absolute expediency (monotonic decrease of the expected value of the average penalty) is introduced. It is shown that a very general but simple condition of symmetry of the nonlinear functions figuring in the reinforcement scheme is necessary and sufficient for absolute expediency. A variety of schemes are simulated and compared on the basis of speed of convergence. It is shown through simulation that absolute expediency often leads to results which are close to optimality.
While the reinforcement scheme updates the state probabilities of the stochastic automaton, a simultaneous estimation of the penalty probabilities of the medium using Bayesian techniques is proposed. This leads to a fast estimate of that state of the automaton with the minimum penalty probability and also provides a confidence level on this estimate.
Absolute expediency only ensures that the scheme converges, but a fast-converging scheme might as well converge to an undesired, that is, non-optimal state with a high probability. Bounds on the probability with which a given absorbing model scheme converges to the (un)desired state are derived for both two-state and multi-state stochastic automata. Further, it is shown that there is a trade-off between the speed of convergence and the bounds on the probability of converging to the (un)desired state.
A consequence of these results is that the reinforcement schemes should be designed by considering both the speed of convergence and the bound on the probability of converging to the desired state. It is shown that for nearly the same speed of convergence, nonlinear schemes have better bounds on the probability of converging to the desired state than the corresponding linear schemes. It is in the design of such nonlinear schemes that the concept of absolute expediency becomes very useful.
Finally, an application of the learning algorithms to the solution of the hypothesis testing problem is presented. Given the upper bounds on the error probabilities of the two kinds, a design procedure for devising an algorithm for stochastic automata which ensures a proper decision is developed. The method is illustrated by an application to a simple detection problem of a known constant signal in additive Gaussian noise

