Efficient Kernel Methods For Large Scale Classification
Abstract
Classification algorithms have been widely used in many application domains. Most of these domains deal with massive collection of data and hence demand classification algorithms that scale well with the size of the data sets involved. A classification algorithm is said to be scalable if there is no significant increase in time and space requirements for the algorithm (without compromising the generalization performance) when dealing with an increase in the training set size. Support Vector Machine (SVM) is one of the most celebrated kernel based classification methods used in Machine Learning. An SVM capable of handling large scale classification problems will definitely be an ideal candidate in many real world applications. The training process involved in SVM classifier is usually formulated as a Quadratic Programing(QP) problem. The existing solution strategies for this problem have an associated time and space complexity that is (at least) quadratic in the number of training points. This makes the SVM training very expensive even on classification problems having a few thousands of training examples.
This thesis addresses the scalability of the training algorithms involved in both two class and multiclass Support Vector Machines. Efficient training schemes reducing the space and time requirements of the SVM training process are proposed as possible solutions. The classification schemes discussed in the thesis for handling large scale two class classification problems are a) Two selective sampling based training schemes for scaling Non-linear SVM and b) Clustering based approaches for handling unbalanced data sets with Core Vector Machine. To handle large scale multicalss classification problems, the thesis proposes Multiclass Core Vector Machine (MCVM), a scalable SVM based multiclass classifier. In MVCM, the multiclass SVM problem is shown to be equivalent to a Minimum Enclosing Ball (MEB) problem and is then solved using a fast approximate MEB finding algorithm. Experimental studies were done with several large real world data sets such as IJCNN1 and Acoustic data sets from LIBSVM page, Extended USPS data set from CVM page and network intrusion detection data sets of DARPA, US Defense used in KDD 99 contest. From the empirical results it is observed that the proposed classification schemes achieve good generalization performance at low time and space requirements. Further, the scalability experiments done with large training data sets have demonstrated that the proposed schemes scale well. A novel soft clustering scheme called Rough Support Vector Clustering (RSVC) employing the idea of Soft Minimum Enclosing Ball Problem (SMEB) is another contribution discussed in this thesis. Experiments done with a synthetic data set and the real world data set namely IRIS, have shown that RSVC finds meaningful soft cluster abstractions.