Show simple item record

dc.contributor.advisorSundaram, Suresh
dc.contributor.authorSikdar, Aniruddh
dc.date.accessioned2025-12-30T04:23:00Z
dc.date.available2025-12-30T04:23:00Z
dc.date.submitted2025
dc.identifier.urihttps://etd.iisc.ac.in/handle/2005/7923
dc.description.abstractRobotic systems are composed of interconnected modules such as planning, control, and perception, where inaccuracies in the perception module can trigger cascading failures throughout the entire pipeline. To achieve reliable operation across diverse environments, including aerial and terrestrial domains, autonomous agents require robust and generalizable scene understanding capabilities. Semantic segmentation, a fundamental visual perception task addressed by deep learning, plays a key role in enabling critical applications including autonomous navigation, medical imaging, land cover classification, and building detection. However, despite significant advances, most models depend on visible-spectrum data (e.g., RGB), which degrade in low-light or adverse weather conditions. These models are typically trained on static, idealized datasets under i.i.d. assumptions, limiting their ability to generalize to the dynamic and unpredictable conditions of real-world environments, leading to significant performance degradation. This thesis addresses these challenges by developing robust perception models for both long-range and ground vehicle views, using RGB and complementary modalities, while maintaining competitive model complexity suitable for real-world deployment. This thesis first focuses on developing a long-range perception system for the visible spectrum (RGB) and Synthetic Aperture Radar (SAR) modalities, aimed at building segmentation and land cover classification, with a particular emphasis on segmenting buildings with diverse and irregular geometric footprints in satellite imagery. The Deep Multi-scale Aware Overcomplete Network (DeepMAO) is proposed to integrate an overcomplete branch for capturing fine structural details and an undercomplete (U-Net) branch for extracting coarse, semantically rich features. The overcomplete branch improves the model’s capability to handle SAR-specific challenges, such as speckle noise, by learning fine-grained representations. Additionally, a novel training strategy, Loss-Mix, is introduced to improve the representation of misclassified pixels by emphasizing them during optimization. Quantitative evaluation on public RGB and SAR datasets demonstrates that DeepMAO outperforms state-of-the-art building segmentation models, achieving an overall performance improvement of 1–2.5%. While complementary sensors like Synthetic Aperture Radar (SAR) and Infrared (IR) are robust to adverse conditions, they lack rich semantic context. To tackle this challenge, the second contribution of this thesis introduces multi-spectral fusion frameworks that leverage RGB and Infrared (IR) data to learn robust, semantically rich representations for improved perception in challenging environments. These frameworks support real-world deployment by capturing both shared and modality-specific features for effective multi-spectral training. Two multi-spectral fusion frameworks are proposed: the Spectral-based Knowledge Distillation Network (SKD-Net) and the Optically-Guided Pixel-level Contrastive Learning Network (OGP-Net), both designed to address scenarios with missing modalities. SKD-Net uses contrastive loss to preserve intra-modality knowledge and introduces a Gated Spectral Fusion module to enhance the distillation process. OGP-Net, a multi-modal fusion model for semantic segmentation, aligns features from multiple modalities into a shared latent space using pixel-level contrastive learning. This approach effectively preserves both semantic context and modality-specific details, ensuring robust performance under both multi-modal and missing modality conditions. Both models are designed to handle missing modality scenarios while maintaining a low model complexity. SKD-Net shows a 2.8% improvement, and OGP-Net achieves a 2–4.5% gain over state-of-the-art models, with significantly reduced complexity. It enables efficient learning with faster convergence and maintains strong performance even with limited training data, making it well-suited for real-world deployment. In scenarios where complementary sensor inputs are unavailable, this thesis improves model robustness to adverse weather conditions using only the RGB modality. As the third contribution, this thesis proposes the Multi-Resolution Feature Perturbation (MRFP) technique to address the challenges of training in simulated driving environments and generalizing to diverse real-world test domains, aiming to improve domain generalization. MRFP improves robustness to domain shifts by training solely on simulated RGB images, introducing perturbations to domain-specific features through High-Resolution Feature Perturbation modules (HRFP and HRFP+), and Normalized Perturbation (NP+). HRFP and HRFP+ extract fine-grained features using a randomly initialized overcomplete autoencoder with a progressively decreasing receptive field, serving as effective perturbations to reduce overfitting and enhance generalization. When trained on simulated images, MRFP delivers a 7.56% improvement over the baseline and consistently outperforms state-of-the-art models across diverse unseen domains such as fog and rain, without introducing additional parameters or inference-time overhead. During real-world deployment, perception models operating in open-world settings may encounter novel semantic classes at inference, posing challenges to model generalization and reliability. As the final contribution, this thesis proposes the Toggle As You Tune (T-CAT) strategy, a prompt learning-based framework designed to enable robust open-world perception. T-CAT tackles two key challenges: generalization to unseen semantic categories and resilience to domain shifts across applications like remote sensing, agriculture, and autonomous driving. By adapting vision-language models (VLMs) through prompt tuning, T-CAT supports cross-domain semantic segmentation while preserving pre-trained knowledge. Evaluated on the MESS benchmark (19 datasets across 9 domains), T-CAT outperforms prior methods, achieving state-of-the-art results without increasing inference-time complexity. Overall, it improves the performance of the base CAT-Seg model by 2.87%. It generalizes well across dataset sizes and transformer backbones, maintaining strong performance even in low-data regimes. Overall, this thesis focuses on developing perception systems with improved robustness to domain shifts for various robotic applications, including ground vehicles and long-range systems. The proposed methods are designed to support training across various configurations: using only complementary sensors, multi-spectral setups combining complementary sensors with RGB, or solely a single modality (RGB), enhancing adaptability to real-world conditions. Extensive evaluations on real-world datasets confirm their effectiveness and practical applicability across diverse scenarios.en_US
dc.language.isoen_USen_US
dc.relation.ispartofseries;ET01204
dc.rightsI grant Indian Institute of Science the right to archive and to make available my thesis or dissertation in whole or in part in all forms of media, now hereafter known. I retain all proprietary rights, such as patent rights. I also retain the right to use in future works (such as articles or books) all or part of this thesis or dissertationen_US
dc.subjectComputer Visionen_US
dc.subjectDeep Learningen_US
dc.subjectRoboticsen_US
dc.subjectautonomous agentsen_US
dc.subjectSynthetic Aperture Radaren_US
dc.subjectRGBen_US
dc.subjectDeep Multi-scale Aware Overcomplete Networken_US
dc.subjectOptically-Guided Pixel-level Contrastive Learning Networken_US
dc.subject.classificationResearch Subject Categories::INTERDISCIPLINARY RESEARCH AREASen_US
dc.titleDevelopment of Perception Systems to Enhance Robustness for Robotic Applicationsen_US
dc.typeThesisen_US
dc.degree.namePhDen_US
dc.degree.levelDoctoralen_US
dc.degree.grantorIndian Institute of Scienceen_US
dc.degree.disciplineEngineeringen_US


Files in this item

Thumbnail

This item appears in the following Collection(s)

Show simple item record