Computer systems performance evaluation

Ferrari, D

View/Open

T01491.pdf (4.751Mb)

Author

Ferrari, D

Metadata

Show full item record

Abstract

The use of the mutual nearest neighbourhood concept for solving certain classes of problems in pattern recognition methodology is the main topic of study in this thesis. The concept of mutual nearest neighbourhood, v/hich takes into account the mutual nearness or two-v/ay nearness of two samples, can be considered as a generalization of the conventional nearest neighbourhood concept, which considers only one-way nearness. A new similarity measure, knovm as the mutual neighbourhood value (mnv), is defined as the sum of the conventional nearest neighbour ranks of two samples, with respect to each other. If two samples are the nearest neighbours of each other, they are said to form a mutual pair. Taking the point of view that clustering of two samples is a matter of mutual nearness rather than one-way nearness, the mutual neighbourhood value, which incorporates the notion of two-way nearness, is suggested as an effective similarity measure for clustering data. Using this similarity measure, two algorithms are developed, one agglomerative and the other divisive in nature. Extensive computer simulation experi ments are carried out to establish their efficacy and ver- satality in solving typical cluster problems, such as spherical and nonspherical clusters, linearly nonseparable ~ P f- B S T R A C T - i - clusters, clusters v/ith unequal populations, and clusters with low density bridges. The condensed nearest neighbour rule of Hart suffers from the drav/back of retaining, in the condensed subset, samples that are not near the decision boundary. A two-stage algorithm is proposed which obviates this disadvantage by growing the condensed subset using samples close to the decision line. Such samples are identified by their boundary neighbourhood values, defined using the concept of mutual nearest neighbourhood. The efficacy of the algorithm is brought out by means of comparison v;ith other methods. Two schemes are then sxiggested for editing and error correction in imperfectly supervised environment. The first scheme is based on the k-nearest neighbour rule. The second method uses a variant of the distance weighted k-nearest neighbour 2rule, the weights being determined by the mutual neighbourhood values. Both the schemes use the editing procedure as a stepping stone for the error correction to proceed. Some computer simulation experiments are performed and some observations are made based on their results. The concept of mutual nearest neighbourhood is then used for unsupervised learning of the parameters of the com ponents of a mixture of normal densities when the number of classes is also unknown. The unsupervised learning problem is formulated here as a multi-stage quasi-supervised 1 1 problem. The quasi-supervised environment is created by the mutualistic teacher by means of assigning identical but unknown labels to the individuals of mutual pairs. In each stage, the mutual pairs are replaced by their sample means. The number of classes are determined at an intermediate sta^-e. The upper and lower bounds on the mutualistic teacher risk in assigning identical labels to the individuals of mutual pairs are estimated. Results of some simulation studies are presented.

URI

https://etd.iisc.ac.in/handle/2005/8928

Collections

Electrical Engineering (EE) [451]