Efficient and Effective Algorithms for Improving the Robustness of Deep Neural Networks

Addepalli, Sravanti

dc.contributor.advisor	Venkatesh Babu, R
dc.contributor.author	Addepalli, Sravanti
dc.date.accessioned	2024-08-23T04:50:22Z
dc.date.available	2024-08-23T04:50:22Z
dc.date.submitted	2024
dc.identifier.uri	https://etd.iisc.ac.in/handle/2005/6604
dc.description.abstract	Deep Neural Networks achieve near-human performance on several benchmark datasets, yet they are not as robust as humans. Their success relies on the proximity of test samples to the distribution of training data, resulting in unexpected behavior with even minor distribution shifts during inference. Deep Networks are also known to be susceptible to adversarial attacks, which are crafted imperceptible perturbations to their inputs, that can cause a classification model to confidently misclassify images into unrelated classes. The rapid adoption of Deep Networks in several critical applications makes it imperative to understand these failure models and develop reliable risk mitigation strategies. This thesis focuses on developing efficient and effective methods for improving the robustness of Deep Networks to both adversarial attacks and distribution shifts. The thesis is organized into four parts. In the first part, we develop Efficient Adversarial Defenses to overcome the large computational overhead of existing adversarial training methods. We further propose methods for Improving the Effectiveness of Adversarial Training by mitigating the associated robustness-accuracy trade-off that limits their performance and utility. In the third part, we propose efficient and effective algorithms for Self-Supervised Learning of Robust Representations, and in the fourth part, we propose methods for Improving Robustness to Distribution Shifts in Data. Efficient Adversarial Defenses: State-of-the-art adversarial defenses are computationally expensive since they use multi-step adversarial training, where the training data is augmented with adversarially perturbed images that are typically generated using ten steps of optimization. To overcome this computational overhead, we propose Bit Plane Feature Consistency Regularizer (BPFC), which achieves adversarial robustness without the generation of adversarial attacks during training, by imposing consistency on the representations of differently quantized images. We further develop methods for improving robustness while maintaining a low computational cost, by using single-step adversarial attacks for training. Single-step adversarial training is known to converge to a degenerate solution with sub-optimal robustness due to the obfuscation of gradients at the data samples, which leads to the generation of weaker attacks during training. To mitigate this, we propose Guided Adversarial Training (GAT) and Nuclear Norm Adversarial Training (NuAT), that explicitly enforce function smoothing in the vicinity of each data sample, thereby preventing obfuscated gradients and resulting in improvements over existing single-step defenses and several multi-step defenses as well. Improving the Effectiveness of Adversarial Training: While Adversarial Training improves the robustness of Deep Networks significantly, one of the key challenges that limits its practical use is the associated drop in natural or clean accuracy, referred to as the robustness-accuracy trade-off. To address this, we first propose Feature Level Stochastic Smoothing (FLSS) based classifier, which introduces stochasticity in the network predictions and utilizes the same for smoothing decision boundaries and rejecting low confidence predictions, thereby boosting the robustness and clean accuracy of the accepted samples. We further investigate the reasons for poor robustness-accuracy trade-off at large perturbation bounds where some attacks change the perception of a human or an Oracle, while other attacks do not. The proposed Oracle-Aligned Adversarial Training (OAAT) overcomes this trade-off by introducing specific attack and defense losses for Oracle-Sensitive and Oracle-Invariant adversarial examples. While the robustness-accuracy trade-off can be alleviated by using more diverse data for training, complex data augmentations have not been successful with Adversarial Training. We investigate the reasons for this trend and propose Diverse Augmentation based Joint Adversarial Training (DAJAT) to address the same by using separate batch-normalization layers for simple and complex augmentations and a Jensen-Shannon divergence loss to encourage their joint learning. Self-Supervised Learning of Robust Representations: Instance-discrimination based Self-Supervised Learning (SSL) methods have shown success in learning transferable representations without using labeled training data. However, these methods are more computationally expensive than supervised training. We investigate the reasons for their slow convergence and propose to improve the same by combining them with pretext tasks such as rotation prediction, which reduce the noise in the training objective. We further utilize these pretrained SSL models in a teacher-student setting for training adversarially robust models without labels. We propose Projected Feature Adversarial Training (ProFeAT), where the pretrained projector of the teacher is reused during distillation to prevent the student model from overfitting to the teacher's training objectives, resulting in a significantly better robustness-accuracy trade-off. Improving Robustness to Distribution Shifts in Data: Deep Networks are known to be sensitive to even minor distribution shifts during inference. We aim to understand this behavior and put forth the (simple) Feature Replication Hypothesis to explain the same. We further propose the Feature Reconstruction Regularizer (FRR) that encourages the use of more diverse features for classification by enforcing that the learned features can be reconstructed back from the logits, thereby improving robustness to distribution shifts. We next propose Diversify-Aggregate-Repeat Training (DART), which trains diverse models using different augmentations (or domains) to explore the loss basin, and further aggregates their weights repeatedly over training to combine their expertise and obtain improved generalization. We finally aim to utilize the superior generalization of black-box Vision-Language Models (VLMs) for improving the OOD generalization in vision tasks. Towards this, we propose Vision-Language to Vision - Align, Distill, Predict (VL2V-ADiP), a teacher-student setting to first align the teacher representations with those of a pretrained student model, and then perform distillation.	en_US
dc.description.sponsorship	Google (Ph.D. fellowship), CII-SERB (PM fellowship), MHRD (Govt. of India)	en_US
dc.language.iso	en_US	en_US
dc.relation.ispartofseries	;ET00614
dc.rights	I grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertation	en_US
dc.subject	Adversarial Machine Learning	en_US
dc.subject	Deep Learning	en_US
dc.subject	Domain Generalization	en_US
dc.subject	Self Supervised Learning	en_US
dc.subject	Adversarial Defenses	en_US
dc.subject	Adversarial Robustness	en_US
dc.subject	Deep Neural Networks	en_US
dc.subject	Bit Plane Feature Consistency Regularizer	en_US
dc.subject.classification	Research Subject Categories::TECHNOLOGY::Information technology::Computer science	en_US
dc.title	Efficient and Effective Algorithms for Improving the Robustness of Deep Neural Networks	en_US
dc.type	Thesis	en_US
dc.degree.name	PhD	en_US
dc.degree.level	Doctoral	en_US
dc.degree.grantor	Indian Institute of Science	en_US
dc.degree.discipline	Engineering	en_US

Files in this item

Name:: Sravanti_PhD_thesis_upload.pdf
Size:: 56.50Mb
Format:: PDF
Description:: Thesis full text

View/Open

This item appears in the following Collection(s)

Department of Computational and Data Sciences (CDS) [102]

Show simple item record