Algorithmic knowledge for a knowledge-based clustering environment.

Negi, Atul

View/Open

T02778.pdf (47.53Mb)

Author

Negi, Atul

Metadata

Show full item record

Abstract

Clustering is a process of partitioning a given set of objects into meaningful groups. It has potential applications in many areas, including: 1.Image Segmentation, 2. Speech/speaker Classification, 2.ECG Classification, 4.Geophysical Exploration, 5.Data Compression in Databases and Knowledge Basest Often, the user is confronted with the selection of an appropriate algorithm for analyzing the data at hand. Even for a knowledgeable user this selection process is complicated because of the presence of potentially large number of uncertain factors that affect the process of clustering. Factors contributing to the performance of various clustering algorithms include: l.Time and Space Complexity, 2.Versatility, 3.Contextual and Conceptual Knowledge. In this thesis, a Divide-and-Conquer approach to clustering is examined in detail. Results relating to its computational efficiency are presented. The k-group admissibility property of the proposed approach is established formally. Various configurations of the methodology and their utilization for different problems is examined. Hierarchical clustering algorithms are applicable for data sets having non-isotropic clusters and, non-hierarchical algorithms are meaningful in the context of isotropic clusters. The K*-d tree data structure, is effectively used for generating both the hierarchical and non-hierarchical classifications. This is based on the efficient computation of Nearest Neighbors using the K-d tree. A novel clustering algorithm, which is based on a "Biased" nearest-neighbor search is also presented. This algorithm uses extrinsic knowledge (in the form of the bias) to extend its versatility of claasifrontion to both, iHotroplo and non-iuotropia clusters. Finally, the above mentioned concepts are combined to extract the necessary algorithmic knowledge for clustering. The various components of the knowledge used in the formation of a Knowledge-Based Clustering Environment are examined. These include, the algorithmic knowledge and the clusterstructure knowledge. The proposed environment can be effectively used for performing various tasks such as: 1.Classification of a given data set, 2.Answering of queries regarding the structural properties of clusters, 3. Selecting a clustering algorithm based on the user's specifications and 4.Structuring a new algorithm based on the specification and the data. The proposed approaches have been applied to various data sets including Fisher's Iris data set. The results are encouraging. It is possible to implement the Knowledge-Based Environment using Object-Oriented Representations

URI

https://etd.iisc.ac.in/handle/2005/7131

Collections

Computer Science and Automation (CSA) [536]