Data clustering and evolutionary algorithms for data mining

Babu, T Ravindra

dc.contributor.advisor	Narasimha Murty, M
dc.contributor.author	Babu, T Ravindra
dc.date.accessioned	2025-10-30T10:57:33Z
dc.date.available	2025-10-30T10:57:33Z
dc.date.submitted	2000
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/7273
dc.description.abstract	In this work, we present a scheme for selecting optimal prototypes from large data sets, as a part of "Data Mining process". Data mining is defined as a process of non-trivial extraction of implicit, previously unknown and potentially useful information, such as, knowledge rules, constraints and regularities, from data in databases. The prototypes are so chosen that they would be sufficient enough to classify any new input pattern with reasonably high classification accuracy. Also such representative patterns are good enough for generating association rules. Handwritten character data is made use for all the exercises in the work. The prototypes are selected by using medoids and leaders. Medoids are most centrally located in a cluster. Both medoids and leaders of a cluster are members of the cluster. After selection of initial set of prototypes that provide a high classification accuracy, evolutionary algorithms are used to compute the optimal set of the prototypes. Further, the dimensionality of the optimal prototypes is reduced by means of optimal feature selection using evolutionary algorithms to arrive at optimal prototypes, each prototype being represented using a minimum number of features. The entire work can be summarized into five stages, viz., (1) Data pre-processing (2) Selection of Representative pattern using medoids and leaders (3) Optimal Prototype Selection using Steady State Genetic Algorithms (SSGA), (4) Optimal Feature Selection using SSGA and (5) Association Rule Generation for classification. The work addresses the challenges in (1) clustering large datasets, (2) demonstration of utility of medoids, leaders and their variants in finding prototypes, (3) use of SSGA for learning, and (4) evolution of a general procedure to deal with large data sets of labeled patterns within the frame work of Knowledge Discovery in Databases (KDD). A large dataset of handwritten digits is used to conduct all the experiments reported in the thesis. Selection of prototypes by means of medoids and leaders has provided good classification accuracy (CA). Computation of medoids is expensive in terms of computation time, whereas computation of leaders is less time consuming. Among the alternate approaches for reduction of number of prototypes, distance-threshold based reduction of prototypes has provided best results both in terms of CA and in reduction of prototypes. Out of the optimal prototype selection approaches, Steady State Genetic Algorithm for medoid selection by means of one-to-one mapping medoids to alleles of chromosome had provided least number of prototypes. The CA obtained for optimal leaders is the highest. The optimal feature selection using SSGA has provided good results. Together, the optimal prototypes with each pattern being represented by an optimal feature set help in generating effective association rules. Such a scenario would effectively classify any new pattern with a high classification accuracy.
dc.language.iso	en_US
dc.relation.ispartofseries	T04719
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Data Mining
dc.subject	Sub-Lexical Processing
dc.subject	Verbal Short-Term Memory
dc.title	Data clustering and evolutionary algorithms for data mining
dc.degree.name	MSc Engg
dc.degree.level	Masters
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T04719.pdf
Size:: 8.547Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [545]

Show simple item record