Data-efficient Deep Learning Algorithms for Computer Vision Applications
Abstract
The performance of any deep learning model depends heavily on the quantity and quality of the available training data. The generalization of the trained deep models improves with the availability of a large number of training samples and hence these models are often referred to as ‘data-hungry’. However, large scale datasets may not always be available in practice due to proprietary/privacy reasons or because of the high cost of generation, annotation, transmission and storage of data. Hence, efficient utilization of available data is of utmost importance, and this gives rise to a class of ML problems, which is often referred to as “data-efficient deep learning”. In this thesis we study the various types of such problems for diverse applications in computer vision, where the aim is to design deep neural network-based solutions that do not rely on the availability of large quantities of training data to attain the desired performance goals. Under the aforementioned thematic area, this thesis focuses on three different scenarios, namely - (1) learning in the absence of training data, (2) learning with limited training data and (3) learning using selected subset of training data.
Absence of training data: Pre-trained deep models hold their learnt knowledge in the form of model parameters that act as ‘memory’ for the trained models and help them generalize well on unseen data. In the first part of this thesis, we present solutions to a diverse set of ‘zero-shot’ tasks, where in absence of any training data (or even their statistics) the trained models are leveraged to synthesize data-representative samples. We dub them Data Impressions (DIs), which act as proxy to the training data. As the DIs are not tied to any specific application, we show their utility in solving several CV/ML tasks under the challenging data-free setup, such as unsupervised domain adaptation, continual learning as well as knowledge distillation (KD). We also study the adversarial robustness of lightweight models trained via knowledge distillation using DIs. Further, we demonstrate the efficacy of DIs in generating data-free Universal Adversarial Perturbations (UAPs) with better fooling rates. However, one limiting factor of this solution is the relatively high computation (i.e., several rounds of backpropagation) to synthesize each sample. In fact, the other natural alternatives such as GAN based solutions also suffer from similar computational overhead and complicated training procedures. This motivated us to explore the utility of target class-balanced ‘arbitrary’ data as transfer set, which achieves competitive distillation performance and can yield strong baselines for data-free KD. We have also proposed data-free solutions beyond classification by extending zero-shot distillation to the object detection task, where we compose the pseudo transfer set by synthesizing multi-object impressions from a pretrained faster RCNN model.
Another concern with the deployment of given trained models is their vulnerability against adversarial attacks. The popular adversarial training strategies rely on availability of original training data or explicit regularization-based techniques. On the contrary, we propose test-time adversarial defense (detection and correction framework), which can provide robustness in absence of training data and their statistics. We observe significant improvements in adversarial accuracy with minimal drop in clean accuracy against state-of-the-art ‘Auto Attack’ without having to retrain the model. Further, we explore an even more challenging problem setup and make the first attempt to provide adversarial robustness to ‘black box’ models (i.e., model architecture, weights, training details are inaccessible) under a complete data-free set up. Our method minimizes adversarial contamination on perturbed samples via proposed ‘wavelet noise remover’ (WNR) that remove coefficients corresponding to high frequency components which are most likely to be corrupted by adversarial attack, and recovers the lost image content by training a ‘regenerator’ network. This results in a high boost in adversarial accuracy when WNR combined with the trained regenerator network is prepended to black box network.
Limited training data: In the second part, we assume the availability of a few training samples, where access to trained models may or may not be provided. In the few-shot setup, existing works obtain robustness using sophisticated meta-learning techniques which rely on the generation of adversarial samples in every episode of training - thereby making it computationally expensive. We propose the first computationally cheaper non-meta learning approach for robust few-shot learning that does not require any adversarial sample. We perform pretraining using self-distillation to make the feature representation of low-frequency samples close to original samples of base classes. Similarly, we also improve the discriminability of low-frequency query set features that further boost the robustness. Our method obtains massive improvement in adversarial performance while being ≈5x faster compared to state-of-the-art adversarial meta-learning methods. However, empirical robustness methods do not guarantee robustness of the trained models against all the adversarial perturbations possible within a given threat model. Thus, we also propose a novel problem of certified robustness of pretrained models in limited data settings. Our method provides a novel sample-generation strategy that synthesize ‘boundary’ and ‘interpolated’ samples to augment the limited training data and uses them in training the denoiser (prepended to pretrained classifier) via aligning the feature representations at multiple granularities (both instance and distribution levels). We achieve significant improvements across diverse sample budgets and noise levels in the white-box and observe similar performance under challenging black-box setup.
Selected subset of training data: In the third part, we enforce efficient utilization via intelligently doing selective sampling on existing training datasets to obtain representative samples for the target task such as distillation, incremental learning and person-reid. Adversarial attacks recently have shown robustness bias, where certain subgroups in a dataset (e.g. based on class, gender, etc.) are less robust than others. Existing works characterize a subgroup’s robustness bias by only checking individual sample’s proximity to the decision boundary. We propose a holistic approach for quantifying adversarial vulnerability of a sample by combining different perspectives and further develop a trustworthy system to alert the humans about the incoming samples that are highly likely to be misclassified. Moreover, we demonstrate the utility of the proposed metric for data (and time)-efficient knowledge distillation which achieves better performance compared to competing baselines. Other applications such as incremental learning and video based person-reid can also be framed as a subset selection problem where representative samples need to be selected. We leverage DPP (Determinantal Point Process) for choosing the relevant and diverse samples. In Incremental learning, we propose a new variant of k-DPP that uses the RBF kernel (termed as “RBF k-DPP”) for challenging task of animal pose estimation and further tackle class imbalance by using image warping as an augmentation technique to generate varied poses for a given image, leading to further gains in performance. In video based re-id, we propose SLGDPP method which exploits the sequential nature of the frames in video while avoiding noisy and redundant (correlated) frames, resulting in outperforming the baseline sampling methods.