Computer systems performance evaluation
Abstract
The use of the mutual nearest neighbourhood concept for
solving certain classes of problems in pattern recognition
methodology is the main topic of study in this thesis.
The concept of mutual nearest neighbourhood, v/hich
takes into account the mutual nearness or two-v/ay nearness
of two samples, can be considered as a generalization of the
conventional nearest neighbourhood concept, which considers
only one-way nearness. A new similarity measure, knovm as
the mutual neighbourhood value (mnv), is defined as the sum
of the conventional nearest neighbour ranks of two samples,
with respect to each other. If two samples are the nearest
neighbours of each other, they are said to form a mutual pair.
Taking the point of view that clustering of two samples
is a matter of mutual nearness rather than one-way nearness,
the mutual neighbourhood value, which incorporates the notion
of two-way nearness, is suggested as an effective similarity
measure for clustering data. Using this similarity measure,
two algorithms are developed, one agglomerative and the other
divisive in nature. Extensive computer simulation experi
ments are carried out to establish their efficacy and ver-
satality in solving typical cluster problems, such as
spherical and nonspherical clusters, linearly nonseparable
~ P f- B S T R A C T
- i -
clusters, clusters v/ith unequal populations, and clusters
with low density bridges.
The condensed nearest neighbour rule of Hart suffers
from the drav/back of retaining, in the condensed subset,
samples that are not near the decision boundary. A two-stage
algorithm is proposed which obviates this disadvantage by
growing the condensed subset using samples close to the
decision line. Such samples are identified by their boundary
neighbourhood values, defined using the concept of mutual
nearest neighbourhood. The efficacy of the algorithm is
brought out by means of comparison v;ith other methods.
Two schemes are then sxiggested for editing and error
correction in imperfectly supervised environment. The first
scheme is based on the k-nearest neighbour rule. The second
method uses a variant of the distance weighted k-nearest
neighbour 2rule, the weights being determined by the mutual
neighbourhood values. Both the schemes use the editing
procedure as a stepping stone for the error correction to
proceed. Some computer simulation experiments are performed
and some observations are made based on their results.
The concept of mutual nearest neighbourhood is then
used for unsupervised learning of the parameters of the com
ponents of a mixture of normal densities when the number
of classes is also unknown. The unsupervised learning
problem is formulated here as a multi-stage quasi-supervised
1 1
problem. The quasi-supervised environment is created by
the mutualistic teacher by means of assigning identical but
unknown labels to the individuals of mutual pairs. In each
stage, the mutual pairs are replaced by their sample means.
The number of classes are determined at an intermediate sta^-e.
The upper and lower bounds on the mutualistic teacher risk
in assigning identical labels to the individuals of mutual
pairs are estimated. Results of some simulation studies are
presented.

