Knowledge-based mining of multi-database for associations

Ananthanarayana, V  S

dc.contributor.advisor	Subramaniam, D K; Narasimha Murty, M
dc.contributor.author	Ananthanarayana, V S
dc.date.accessioned	2026-01-21T10:01:07Z
dc.date.available	2026-01-21T10:01:07Z
dc.date.submitted	2001
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/8317
dc.description.abstract	Knowledge Discovery in Databases (KDD) uses knowledge engineering tools and database technology to extract hidden patterns from databases. Data mining is a step in the KDD process that finds useful patterns in the data. There are many categories of data mining, and one of the most important among them is association rule mining (ARM). The ARM activity establishes an association between two non-overlapping sets of frequently occurring values in a database. Most ARM activity addressed in the literature is based on a single transaction database. In real-world applications, we have a large collection of data organized in several databases. Mining relations across different databases might reveal useful hidden associations. For example, data about a person’s behavior is distributed across several organizations. More specifically: CUSTOMER database describes a person’s purchase behavior. EMPLOYEE database gives professional information. PATIENT database provides medical history. AIRLINE database gives travel details. There may be an association between salary, region of living, mode of traveling, and disease; for instance, people who have a salary in the range of US$10,000–US$20,000, eat frequently at CITY X hotels, and travel by air in executive class may have cardiovascular disease, where CITY X is marked with a high pollution rating. These hidden associations can be discovered if we mine across multiple databases. Most ARM algorithms are data-driven and hence generate a large number of associations. However, in practice, only a relevant subset of this rule set, which aids in decision-making, is of interest. Such mining activity can be performed using knowledge for mining only those rules that are useful for decision-making. The use of knowledge not only reduces the number of associations generated but also reduces the computational time and space required for mining. The research work presented in this thesis covers a spectrum of novel methods and data structures for efficient generation of associations. The following issues are addressed in this thesis: • Multiple Database-Oriented ARM Activity Not enough attention is paid to mining multiple databases for associations. We call such associations inter-database associations. In this thesis, we propose and implement efficient schemes to obtain inter-database associations. The proposed schemes use domain knowledge in the form of a semantic network to efficiently link various databases. • Knowledge-Based Mining In ARM activity, knowledge in the form of an is-a hierarchy is used for discovering generalized rules. In this thesis, we propose a taxonomy based on functionality and a restricted way of generalizing values, and use it for efficient ARM. • Single Scan of the Database Data handled by ARM activity is very large; methods to handle this activity should be efficient in terms of computational resources like disk and memory. The best possible algorithm reported in the literature requires at least two database scans. In this thesis, we propose schemes that need only a single scan of the database for finding associations. • Abstraction Generation In this thesis, we propose schemes based on intermediate representations (structures) of the data, which we call abstractions. These structures are generated based on data and/or domain knowledge. We use these abstractions instead of the raw data in the database for mining associations. Discovery of associations using abstractions is more efficient because these structures represent data in a compact form and are appropriate for association finding. • Dynamic Mining ARM activity based on change of data, called incremental mining, and change of support value are described in the literature. Mining under incremental change to knowledge without having to access the original database is not reported in the literature. In this thesis, we propose novel schemes for mining that can handle changes in data, knowledge, and support value, which we call dynamic mining. In this thesis, we report four schemes: Extended Inverted File (EIF)-based scheme And-Or Taxonomy (AO taxonomy)-based scheme Pattern Count Tree (PC-tree)-based scheme Multi-database Domain Link Network (MDLN)-based scheme Even though all four schemes support discovery of associations across multiple databases, the PC-tree-based and AO taxonomy-based schemes are basically single-database-oriented schemes. Based on theoretical and experimental studies, we make the following observations: The MDLN structure-based scheme is ideally suited for mining multiple databases for discovering associations. The scheme that exploits knowledge in the best possible way is the AO taxonomy-based scheme. Knowledge helps reduce the number of candidates generated and the number of database scans required. The PC-tree-based scheme is ideally suited for dynamic mining.
dc.language.iso	en_US
dc.relation.ispartofseries	T05136
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation
dc.subject	Knowledge Discovery in Databases
dc.subject	Multi-database Domain Link Network based scheme
dc.subject	Patient database
dc.title	Knowledge-based mining of multi-database for associations
dc.type	Thesis
dc.degree.name	PhD
dc.degree.level	Doctoral
dc.degree.grantor	Indian Institute of Science
dc.degree.discipline	Engineering

Files in this item

Name:: T05136.pdf
Size:: 90.45Mb
Format:: PDF

View/Open

This item appears in the following Collection(s)

Computer Science and Automation (CSA) [542]

Show simple item record