Show simple item record

dc.contributor.advisorSubramaniam, D K; Narasimha Murty, M
dc.contributor.authorAnanthanarayana, V S
dc.date.accessioned2026-01-21T10:01:07Z
dc.date.available2026-01-21T10:01:07Z
dc.date.submitted2001
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/8317
dc.description.abstractKnowledge Discovery in Databases (KDD) uses knowledge engineering tools and database technology to extract hidden patterns from databases. Data mining is a step in the KDD process that finds useful patterns in the data. There are many categories of data mining, and one of the most important among them is association rule mining (ARM). The ARM activity establishes an association between two non-overlapping sets of frequently occurring values in a database. Most ARM activity addressed in the literature is based on a single transaction database. In real-world applications, we have a large collection of data organized in several databases. Mining relations across different databases might reveal useful hidden associations. For example, data about a person’s behavior is distributed across several organizations. More specifically: CUSTOMER database describes a person’s purchase behavior. EMPLOYEE database gives professional information. PATIENT database provides medical history. AIRLINE database gives travel details. There may be an association between salary, region of living, mode of traveling, and disease; for instance, people who have a salary in the range of US$10,000–US$20,000, eat frequently at CITY X hotels, and travel by air in executive class may have cardiovascular disease, where CITY X is marked with a high pollution rating. These hidden associations can be discovered if we mine across multiple databases. Most ARM algorithms are data-driven and hence generate a large number of associations. However, in practice, only a relevant subset of this rule set, which aids in decision-making, is of interest. Such mining activity can be performed using knowledge for mining only those rules that are useful for decision-making. The use of knowledge not only reduces the number of associations generated but also reduces the computational time and space required for mining. The research work presented in this thesis covers a spectrum of novel methods and data structures for efficient generation of associations. The following issues are addressed in this thesis: • Multiple Database-Oriented ARM Activity Not enough attention is paid to mining multiple databases for associations. We call such associations inter-database associations. In this thesis, we propose and implement efficient schemes to obtain inter-database associations. The proposed schemes use domain knowledge in the form of a semantic network to efficiently link various databases. • Knowledge-Based Mining In ARM activity, knowledge in the form of an is-a hierarchy is used for discovering generalized rules. In this thesis, we propose a taxonomy based on functionality and a restricted way of generalizing values, and use it for efficient ARM. • Single Scan of the Database Data handled by ARM activity is very large; methods to handle this activity should be efficient in terms of computational resources like disk and memory. The best possible algorithm reported in the literature requires at least two database scans. In this thesis, we propose schemes that need only a single scan of the database for finding associations. • Abstraction Generation In this thesis, we propose schemes based on intermediate representations (structures) of the data, which we call abstractions. These structures are generated based on data and/or domain knowledge. We use these abstractions instead of the raw data in the database for mining associations. Discovery of associations using abstractions is more efficient because these structures represent data in a compact form and are appropriate for association finding. • Dynamic Mining ARM activity based on change of data, called incremental mining, and change of support value are described in the literature. Mining under incremental change to knowledge without having to access the original database is not reported in the literature. In this thesis, we propose novel schemes for mining that can handle changes in data, knowledge, and support value, which we call dynamic mining. In this thesis, we report four schemes: Extended Inverted File (EIF)-based scheme And-Or Taxonomy (AO taxonomy)-based scheme Pattern Count Tree (PC-tree)-based scheme Multi-database Domain Link Network (MDLN)-based scheme Even though all four schemes support discovery of associations across multiple databases, the PC-tree-based and AO taxonomy-based schemes are basically single-database-oriented schemes. Based on theoretical and experimental studies, we make the following observations: The MDLN structure-based scheme is ideally suited for mining multiple databases for discovering associations. The scheme that exploits knowledge in the best possible way is the AO taxonomy-based scheme. Knowledge helps reduce the number of candidates generated and the number of database scans required. The PC-tree-based scheme is ideally suited for dynamic mining.
dc.language.isoen_US
dc.relation.ispartofseriesT05136
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subjectKnowledge Discovery in Databases
dc.subjectMulti-database Domain Link Network based scheme
dc.subjectPatient database
dc.titleKnowledge-based mining of multi-database for associations
dc.typeThesis
dc.degree.namePhD
dc.degree.levelDoctoral
dc.degree.grantorIndian Institute of Science
dc.degree.disciplineEngineering


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record