Algorithmic knowledge for a knowledge-based clustering environment.
Abstract
Clustering is a process of partitioning a given set of
objects into meaningful groups. It has potential applications
in many areas, including: 1.Image Segmentation,
2. Speech/speaker Classification, 2.ECG Classification,
4.Geophysical Exploration, 5.Data Compression in Databases
and Knowledge Basest Often, the user is confronted with the
selection of an appropriate algorithm for analyzing the data
at hand. Even for a knowledgeable user this selection process
is complicated because of the presence of potentially large
number of uncertain factors that affect the process of
clustering.
Factors contributing to the performance of various
clustering algorithms include: l.Time and Space Complexity,
2.Versatility, 3.Contextual and Conceptual Knowledge. In this
thesis, a Divide-and-Conquer approach to clustering is
examined in detail. Results relating to its computational
efficiency are presented. The k-group admissibility property
of the proposed approach is established formally. Various
configurations of the methodology and their utilization for
different problems is examined.
Hierarchical clustering algorithms are applicable for
data sets having non-isotropic clusters and, non-hierarchical
algorithms are meaningful in the context of isotropic
clusters. The K*-d tree data structure, is effectively used
for generating both the hierarchical and non-hierarchical
classifications. This is based on the efficient computation
of Nearest Neighbors using the K-d tree. A novel clustering
algorithm, which is based on a "Biased" nearest-neighbor
search is also presented. This algorithm uses extrinsic
knowledge (in the form of the bias) to extend its versatility
of claasifrontion to both, iHotroplo and non-iuotropia
clusters.
Finally, the above mentioned concepts are combined to
extract the necessary algorithmic knowledge for clustering.
The various components of the knowledge used in the formation
of a Knowledge-Based Clustering Environment are examined.
These include, the algorithmic knowledge and the clusterstructure
knowledge. The proposed environment can be
effectively used for performing various tasks such as:
1.Classification of a given data set, 2.Answering of queries
regarding the structural properties of clusters, 3. Selecting
a clustering algorithm based on the user's specifications and
4.Structuring a new algorithm based on the specification and
the data.
The proposed approaches have been applied to various
data sets including Fisher's Iris data set. The results are
encouraging. It is possible to implement the Knowledge-Based
Environment using Object-Oriented Representations

