Outlier Detection with Applications in Graph Data Mining
Abstract
Outlier detection is an important data mining task due to its applicability in many contemporary applications such as fraud detection and anomaly detection in networks, etc. It assumes significance due to the general perception that outliers represent evolving novel patterns in data that are critical to many discovery tasks. Extensive use of various data mining techniques in different application domains gave rise to the rapid proliferation of research work on outlier detection problem. This has lead to the development of numerous methods for detecting outliers in various problem settings. However, most of these methods deal primarily with numeric data. Therefore, the problem of outlier detection in categorical data has been considered in this work for developing some novel methods addressing various research issues. Firstly, a ranking based algorithm for detecting a likely set of outliers in a given categorical data has been developed employing two independent ranking schemes. Subsequently, the issue of data dimensionality has been addressed by proposing a novel unsupervised feature selection algorithm on categorical data. Similarly, the uncertainty associated with the outlier detection task has also been suitably dealt with by developing a novel rough sets based categorical clustering algorithm.
Due to the networked nature of the data pertaining to many real life applications such as computer communication networks, social networks of friends, the citation networks of documents, hyper-linked networks of web pages, etc., outlier detection(also known as anomaly detection) in graph representation of network data turns out to be an important pattern discovery activity. Accordingly, a novel graph mining method has been envisaged in this thesis based on the concept of community detection in graphs. In addition to finding anomalous nodes and anomalous edges, this method is capable of detecting various higher level anomalies that are arbitrary sub-graphs of the input graph. Subsequently, these ideas have been further extended in this thesis to characterize the time varying behavior of outliers(anomalies) in dynamic network data by defining various categories of temporal outliers (anomalies). Characterizing the behavior of such outliers during the evolution of the network over time is critical for discovering different anomalous connectivity patterns with potential adverse effects such as intrusions into a computer network, etc. In order to deal with temporal outlier detection in single instance network/graph data, the link prediction task has been leveraged in this thesis to produce multiple instances of the input graph. Thus, various outlier detection principles have been successfully applied for mining various categories of temporal outliers(anomalies) in the graph representation of network data.
Collections
Related items
Showing items related by title, author, creator and subject.
-
Optimum Event Detection In Wireless Sensor Networks
Karumbu, Premkumar (2013-06-12)We investigate sequential event detection problems arising in Wireless Sensor Networks (WSNs). A number of battery–powered sensor nodes of the same sensing modality are deployed in a region of interest(ROI). By an event ... -
Electrochemical Biosensors based on Novel Receptors for Diabetes Management
Kumar, Vinay (2018-01-15)To address the challenge of accurate, low cost and robust biosensors for diabetes management and early detection of diabetes complications, we have developed novel, robust sensing chemistry (or receptors) for electrochemical ... -
Design And Development Of A Liquid Scintillator Based System For Failed Fuel Detection And Locating System In Nuclear Reactors
Sumanth, Panyam (2009-07-02)Failed fuel refers to the breach in the fuel-clad of an irradiated fuel assembly in a nuclear reactor. Neutron detection or gamma detection is commonly used in Failed Fuel Detection and Locating (FFDL) system to monitor ...