Learing automata algorithms for connectionist systems local and global convergence
Abstract
Connectionist systems have been studied with much interest as models for the brain and also as systems which work in a parallel and distributed manner. These systems exhibit desirable properties such as learning capability and robustness.
In this thesis, learning algorithms for connectionist systems are developed and analysed. Connectionist systems based on the generalised reinforcement learning model are considered. This model consists of a learning system and an environment interacting with each other. The environment is characterised by a set of context vectors and a set of actions. At each instant the environment generates a context vector which forms an input to the learning system. Based on its internal state and the context vector from the environment, the learning system outputs an action affecting the environment. This evokes a response from the environment indicating the suitability of the action to the particular context vector. The learning system then updates its internal state based on this information using an algorithm.
Algorithms to learn the optimal action for each context vector are considered here. The algorithms are completely decentralised and there is no information exchange involved between the various units of the system.
The problem of learning the optimal action is posed as an optimisation problem with respect to the internal state of the learning system. Then, using weak convergence techniques, the algorithm under consideration is approximated by an Ordinary Differential Equation (ODE) or a Stochastic Differential Equation (SDE). The ODE/SDE and the optimisation problem are studied together to show that the algorithm converges to a solution of the optimisation problem. Initially, local algorithms, that is, algorithms that converge to local solutions of the optimisation problem are considered.
In Chapter 2 the units that make up the connectionist system are composed of hierarchies of teams of learning automata. The Linear Reward-Inaction algorithm is used and it is shown that the ODE approximating this algorithm has properties which indicate that it solves the related optimisation problem locally.
In Chapter 3 the REINFORCE algorithm is analysed and it is shown that it can lead to unbounded solutions. A new algorithm based on constrained optimisation is presented and analysed. It is shown that this algorithm overcomes the problem of unboundedness and also exhibits local convergence properties.
In many cases, local properties are not sufficient and global convergence is needed. It is known that the Langevin equation solves a global optimisation problem. Algorithms based on this idea are suggested to solve the optimisation problem globally.
In Chapter 4, an algorithm based on the results in Chapter 3 is developed. It is shown that this algorithm can be approximated by the Langevin equation and thus global properties are obtained.
In Chapter 5, the techniques of Chapter 4 are applied to learning automata. It is difficult to apply such techniques directly. So the learning automata are parameterised and algorithms with global convergence properties are developed.
The algorithms developed here have been tested via simulations on pattern recognition problems in which the convergence properties indicated by the analysis are brought out.
To sum up, this thesis shows that complicated connectionist systems composed of units of learning automata can be designed to have the desired properties of local and global convergence.

