Enhanced neural networks architectures for pattern classification
Abstract
The thesis deals with the problem of pattern classification for which new algorithms
using three different neural network models have been proposed: (i) Recursive
(ii) Feedforward; and (iii) Self-Organizing Networks. These models are
‘enhanced’ to provide a more efficient (in terms of speed, robustness and size of
the network) solution to the classification problem.
As far as the first neural network model (namely, recursive) is concerned, it is
well known that it achieves pattern classification through a process called ‘pattern
association’. In order to improve the performance of recursive networks for pattern
association, we have considered the following problems in their design.
— Given a set of patterns in the form of bipolar vectors, how to design a
recursive network that acts as an associative memory for these patterns,
with basins of attraction as large as possible?
— When all the given patterns cannot be accommodated in the network,
how to design the network such that maximal number of patterns are
stored in the network with basins of attraction as large as possible?
We propose two methods to solve them:
Development of optimal learning algorithms by showing the equivalence
between the learning in recursive networks and that in the (2-state) Perceptron.
With this equivalence, optimal learning algorithms for the Perceptron
can be directly applied to the problem of optimal learning in the
recursive neural network models. In addition to the existing learning algorithms
for the perceptron, we propose two other algorithms for optimal
learning in perceptron, based on Thermal Perceptron Learning Rule (TPLR)
[1], and compare them, as applied to the problem of recursive network
design.
Modification of the dynamics of the network by using a finite number of the
previous state values at each node. It has been shown, empirically, that the
new network has larger regions of attraction around the equilibrium points.
Convergence conditions for the network have also been established.
The second model of networks (namely, feedforward) is the most widely used
network architecture for pattern classification. The input to these networks is
in the form of real-valued feature vectors. In the design of such networks, the
basic issue is to determine the appropriate network size for a given problem: the
network size should be sufficiently large to be able to realize the required input-output
mapping, but should be small enough to give good generalization results.
In the literature, constructive learning algorithms offer a solution to this problem.
These algorithms, instead of training a network of fixed 0, start with a network
of a small size, and add neurons as and when required.
We develop new constructive learning algorithms for the construction of feedforward
neural networks whose size is near-optimal:
Modified Upstart and Tower algorithms which use the TPLR for training
the hidden neurons; and
A modification of the TPLR, called Biased TPLR (BTPLR) for training
the hidden neurons in the design of the networks using the paradigm of
Sequential Learning [2].
An advantage of the latter (namely, BTPLR) is that it can be applied to the
construction of networks where the hidden neurons have activation functions other
than the threshold functions, such as the window and cluster activation functions.
For window neurons, the BTPLR gives results marginally superior to those of [3].
We demonstrate, by simulation studies, the superiority of the proposed algorithms
(in terms of network complexity, conceptual simplicity, and ease in implementation)
over similar algorithms of the literature. Amongst the algorithms
for networks with threshold neurons, the Sequential Learning Algorithm (SLA)
with BTPLR gives networks with the smallest number of hidden neurons, and
is faster than the SLA of [2]. The networks generated by the Tower Algorithm
also give a performance comparable to the algorithms based on SLA, except when
applied to the problem of random Boolean mappings.
After some minor modifications, all the proposed algorithms can be used for
problems with real-valued data and with multiple classes also.
Finally, based on the third model (namely, self-organization), a novel method
has been proposed for the classification of patterns given in the form of 2-D binary
images. For each of the exemplar patterns, a network of neurons is constructed,
with its neurons arranged in exactly the same way as the corresponding exemplar.
Each network is mapped onto the given test pattern using a self-organization
scheme. Some measures of mapping are proposed based on which classification is
carried out. The technique has been applied to the problem of object recognition.
The results obtained show its efficacy and robustness.
The thesis is organized as follows: Chapter 1 gives an introduction to the
three neural network models discussed in the thesis. Techniques for improving
the performance of the recursive networks for pattern association are discussed
in Chapter 2. In Chapter 3, we propose new constructive learning algorithms
for the design of feedforward neural networks for pattern classification. A new
self-organizing architecture for pattern recognition is proposed in Chapter 4. The
thesis ends with conclusions in Chapter 5.