Learning From Examples Using Hierarchical Counterfactual Expressions
Abstract
In this study, we develop algorithms for learning
concepts from examples.
Learning is the capability that allows a system to
improve its performance. It involves the ability to correct
errors, learn domain knowledge, generate algorithms,
formulate new abstractions and solutions by observing the
environment or by drawing analogies to past experiences.
Machines with such abilities would have an edge over their
human counterparts as they do not suffer from poor memory,
distracted attention, low efficiency and the difficulty of
transferring acquired knowledge from one to another.
Machine learning manifests itself as a spectrum of
information processing. Learning by induction involves
presenting a system with labelled examples. The system must
generalize the examples and find higher level rules, which
can be used to guide the system more efficiently and
effectively.
A flexible data structure is required to represent
knowledge and allow for the representation of any internal
structure in the data. Additionally since knowledge is
dynamic, it is necessary that the data structure be easily
modifiable to incorporate the new knowledge.
Introduction
We have developed algorithms to learn concepts using the
flexible data structure HC-expression to represent them. HC expressions
are tree like. The nodes represent a subset of
the event space. The arcs that are labelled positive and
negative on alternate levels, lead to positive and negative
exception nodes.
1) We implemented algorithm "Oneshot' which learns a
concept description given all the positive and negative
events together. It was used on Ayurvedic Jaundice data.
The descriptions generated were simple and the time was not
significant.
2) An incremental algorithm was designed and implemented
which developed the concept description consuming one event
at a time. The algorithm avoided the pitfalls of incremental
learning schemes by introducing controlled generalization and
specialization. This was done by using a 'similarity'
measure as in clustering in Pattern Recognition. If there
were too many exceptions to a node it was partitioned in the
event space. The algorithm yielded simple descriptions and
needed less time than the Oneshot algorithm.
Possible parallelism is discussed:
a) Within a single HC-expression by evaluating sibling nodes
concurrently. Feasible on a highly parallel machine- with
little communication overheads, and
b) By merging two or more HC-expressions. The input data
could • be partitioned into subsets and HC-expressions
generated for each of them in parallel. Later these could be
merged.
Contribution of the Thesis
3) Lastly we compare and contrast the following learning
algorithms: Candidate Elimination on Version Spaces, ID3, ID4
and ID5 which build decision trees, ExceL that generates
rules with exceptions, Counterfactuals that generates
Multilevel Counterfactuals and HC-expressions. The
comparison is on: input and rule representation, background
knowledge and evaluation functions required, expression of
disjunctive concepts, incremental learning ability, handling
of noise, and possible parallelism.
Organization of the Thesis
In the introduction we discuss issues in Machine Learning
in general with emphasis on learning by induction. A case is
made for incremental learning algorithms. In chapter 2, we
outline the data structure HC-expression and a Oneshot
algorithm for learning a description. In chapter 3, we
present our incremental algorithm and some experimental
results. We also discuss possible parallelism with HC expressions.
In chapter 4, we present a comparative study of
hierarchical inductive learning algorithms. In chapter 5,
our conclusions and suggestions for future work are
presented.

