Spatio-temporal Memories: Theory and Algorithms
Abstract
This thesis is primarily focused on building a systematic theory for spatio-temporal memories based on unsupervised learning and exploring its applications. Also, contributions are made in the area of supervised learning techniques.
The self-organizing map (SOM) is a biologically inspired unsupervised learning paradigm which finds numerous applications in learning, clustering and recalling spatial input patterns. SOM is also used as a software tool for visualizing high-dimensional data space by projecting it onto a lower dimensional space, typically two-dimensions and can be considered as a vector quantization (VQ) technique. Since the time SOM was originally published, many variants of it have appeared and used extensively in engineering applications. A notable variant is the neural gas (NGAS) algorithm.
Though the SOM and its variants work well in practice, the learning rules in the basic SOM are heuristic. Also, the techniques are well suited for clustering and recall of spatial patterns. However, if the data is spatio-temporal, these techniques fail to learn the temporal dynamics along with the spatial proximities in the input data space. The traditional approach for learning spatio-temporal patterns is to incorporate time on the output space of a SOM along with heuristic update rules that work well in practice.
Inspired by the pioneering work of Alan Turing, who used reaction-diffusion equations to explain spatial pattern formation, we develop an analogous theoretical model for building a spatio-temporal memory to learn and recall temporal patterns. Using coupled reaction-diffusion equations, we develop a mathematical theory from first principles for constructing a spatio-temporal SOM (STSOM) and derive an update rule for learning based on the gradient of a potential function. The temporal plasticity effect observed during recall in response to the input dynamics is mathematically quantified.
We develop a systematic theory to reconstruct missing samples in a time series using a spatio-temporal memory. Towards this goal, a novel algorithm to estimate the Markov order of a given input sequence is proposed. The efficacy of the reconstruction technique is tested on synthetic and real data sets to validate the theory.
Several applications require classification of the data by prioritizing some of the input attributes. The basic SOM and its variants consider the entire input without any priority over the attributes. We propose an unsupervised learning algorithm called the priority based soft vector quantization feature map (PSVQFM) and an architecture that prioritizes attributes of the input data towards clustering. We also present an analysis on the misclassification error and prove that the proposed algorithm is asymptotically optimal. Though the PSVQFM is not directly linked to spatio-temporal memories, the proposed algorithm can be folded within a spatio-temporal memory architecture that incorporates priority based attributes within spatio-temporal sequences.
Learning rate is a crucial parameter governing the convergence rate of any learning algorithm. Most of the learning algorithms based on the stochastic gradient descent (SGD) method depend on heuristic choices of the learning rate. In this thesis, we derive improved bounds on the learning rate of SGD based adaptive learning algorithms by analyzing the largest eigenvalue of the Hessian matrix from first principles. This idea can be explored for both supervised and unsupervised learning algorithms based on SGD.
We have also contributed novel ideas and techniques for supervised learning algorithms, summarized below.
We proposed an artificial neural network (ANN) architecture based on the back-propagation algorithm that non-linearly transforms a given probability density function (pdf) into a desired pdf. We provide an estimate on the number of hidden neurons based on the input statistics. This approach is useful for density estimation problems.
The Hessian based bounds on learning rates for algorithms based on SGD and the supervised learning technique for pdf transformation are relegated to the Appendices to keep the flow of the thesis coherent.