Studies in learning and representation in connectionist networks
Abstract
This work deals with questions related to learning and representation in connectionist networks. The main issue taken up in learning is the convergence of local learning rules for feedforward networks. We ask the following question:
Learning:
Are there universal (i.e., capable of representing all reasonable functions) feedforward architectures in which the convergence of local learning rules can be proved? By “local learning rule" we mean those rules which (1) do not use information regarding units to which they are not directly connected, and (2) do not have large memory requirements.
We propose a network architecture based on finite Fourier series to answer the above question in the affirmative. In this process, we extend the notion of learning to that of learning functions from the usual hetero-associative learning. We also propose an attractor model for content-addressable memory (a recurrent network) in which learning takes place in the feedforward configuration. Convergence of the learning process is proved in all the models. We also do some peripheral studies relating to tensor product networks, sequence generation, different learning rules, etc.
We next take up some representation problems in connectionist modeling. The specific problems taken up are:
Validation:
Consider the Hopfield model of content-addressable memory (CAM). This is an attractor model in which the network dynamics converges to an attractor on every input. The validation problem involves the following two aspects:
Finding out, in a distributed fashion, whether an attractor corresponds to one of the stored patterns or not.
Reducing the number of spurious attractors.
Multiple objects representation:
Most networks can represent only one object at a time. To represent multiple objects, the networks are usually replicated. This results in the replication of learned information (i.e., connection strengths) in addition to using a large number of units. We explore the following issues:
(1) Are there ways of representing multiple objects without incurring undue overhead in the number/complexity of units?
(2) What type of tasks can be performed by such networks?
We call these two representation problems since we expect their solution to be a part of the intrinsic capability of the network structure rather than network operation, i.e., dynamics. This happens to be the case since these two can be solved in activation. The solution for both problems is based on the notion of complex activations.
To solve the validation problem, we introduce complex numbers to the basic model—in this work, the Hopfield network. In the complex domain, the attractors can be both real (binary) and complex. We define genuine attractors as only the real attractors. Now the spurious memories correspond to real attractors which are not one of the stored patterns. In such models, an input which flows into a complex attractor can be considered novel. This solves one aspect of the problem mentioned above. We have to show that the number of spurious memories is small to solve the second aspect. We propose two ways of complexifying the Hopfield model—the camouflage and the imaginary noise models—and show that the second aspect is also satisfied.
In the camouflage model, we introduce an output function for the units. This function is complex even for binary arguments. The learning rule, which involves the output function, results in complex connection strengths even when binary patterns are stored. In the imaginary noise model, the imaginary part of the connection strengths is noise. In this way, complex numbers can be introduced to the basic model.
For both models, we calculate the number of binary equilibrium points which we take to be roughly equal to the number of spurious memories. The performance of the camouflage model for small values of N (the dimensionality of the network) is qualitatively different from that for large N due to certain approximations that can be done for small N. We show that for the camouflage model (for small N) and for the imaginary noise model:
The number of spurious memories can be written in the form kcN. For certain values of the parameters of the models, we have c < 1. Thus, for these parameter values, the model is satisfactory.
The capacity—the number of patterns that can be made equilibrium points using the learning rule—is also satisfactory, though less than that of the Hopfield model.
As mentioned before, the multiple objects representation problem is also solved using complex activations. To store n objects, we choose n complex numbers, called basic phases, such that their possible sums are different. We assign one phase to each object to be represented. The activation pattern for the combination of objects is the superposition of the individual activation patterns. Since the basic phases are chosen such that the possible sums are different, different combinations of objects result in different activation patterns. It is difficult to choose the basic phases satisfying the above requirement. Hence, we restrict the scope of representation to those objects in which a feature is present in not more than two objects. This is called the duo mode of representation. We give examples of basic phases which can represent up to 5 duo patterns simultaneously. Though the number appears to be small, we indicate how sentences can be represented using this scheme.
We take up an application of the representation scheme proposed—multiple content-addressable memories (MCAMs). In MCAMs, more than one pattern can be retrieved in the content-addressable mode. Thus, the knowledge in the network is used to solve multiple problems without being replicated. We use an existing attractor CAM for storing sparse binary patterns as the basic network. We indicate how three duo patterns can be retrieved simultaneously. We also prove the global stability of two complex Hopfield-like models (complex Hopfield-like models have conjugate-symmetric connection matrices). The MCAM model proposed belongs to one class of such models. We also present some probabilistic calculations on the performance of the network and verify them through simulations.