Pattern classification using conjuctive conceptual clustering procedures
Abstract
This thesis deals with the salient features of the conjunctive conceptual algorithm CLUSTER/2 and describes new algorithms based on conjunctive concepts to overcome some of the problems associated with CLUSTER/2.
From a collection of events, some background knowledge and a goal or purpose for clustering, CLUSTER/2 generates a classification composed of clusters of events and corresponding conjunctive form cluster descriptions. CLUSTER/2 uses an arbitrarily selected initial seed set. Based on experimental studies, it was observed that a proper selection of initial seed set will lead to improved performance both in terms of classification and computational efficiency. A criterion which is capable of detecting a proper initial seed set is designed and an initial seed selection algorithm using this criterion is proposed in this thesis.
CLUSTER/2 is meant for generating cluster descriptions which can be easily interpreted by human beings when the data set consists of qualitative variables. However, for mixed variable data sets, it leads to an arbitrary classification. To circumvent this problem, a hybrid scheme consisting of a suitable statistical clustering method at the first phase and a conceptual clustering algorithm like CLUSTER/2 at the second phase is proposed. During the first phase, the mixed data set is transformed to complete qualitative data. Application of a conceptual clustering algorithm to this data constructs a satisfactory classification. The hybrid scheme is applied to three different mixed variable data sets.
It is also observed that by using an appropriate data transformation technique, it is possible to obtain the same results as that of CLUSTER/2 by using traditional numerical taxonomy methods. To illustrate this, hierarchical linkage algorithms are applied to different data sets after encoding the original variables to binary variables.
CLUSTER/2 uses a direct (non-hierarchical) method to construct a hierarchical cluster structure in a divisive fashion. This requires additional computation to transform the results obtained by a non-hierarchical algorithm into a hierarchical structure. Instead, an agglomerative hierarchical conceptual clustering method (HCCA) is proposed which directly generates a hierarchical structure following the natural cluster growth. This approach evidently requires less computation.

