• Login
    View Item 
    •   etd@IISc
    • Division of Mechanical Sciences
    • Aerospace Engineering (AE)
    • View Item
    •   etd@IISc
    • Division of Mechanical Sciences
    • Aerospace Engineering (AE)
    • View Item
    JavaScript is disabled for your browser. Some features of this site may not work without it.

    Learning Action Priors for Deep Visual Predictions

    Thumbnail
    View/Open
    Thesis full text (69.65Mb)
    Author
    Sarkar, Meenakshi
    Metadata
    Show full item record
    Abstract
    This thesis addresses the critical challenge of visual prediction in mobile robotics, particularly focusing on scenarios where cameras mounted on autonomous robots must navigate dynamic environments with human presence. While recent advances in artificial intelligence and machine learning have revolutionized natural language processing and generative AI, similar breakthroughs in video prediction for mobile platforms remain elusive due to the inherent complexities of disentangling robot motion from environmental dynamics. We identify a fundamental gap in existing approaches: the failure to explicitly incorporate robot control actions into visual prediction frameworks. In our initial work, we proposed the Velocity Acceleration Network (VANet), designed to extract and disentangle robot motion dynamics from visual data using motion flow encoders. While VANet represented progress in understanding the interplay between robot movement and visual information, it still relied on inferring motion effects from raw data rather than directly incorporating control signals. To address this limitation, we introduce the Robot Autonomous Motion Dataset (RoAM), a novel open-source stereo-image dataset captured using a Turtlebot3 Burger robot equipped with a Zed mini stereo camera. Unlike existing datasets, RoAM provides synchronized control action data alongside visual information, 2D LiDAR scans, IMU readings, and odometry data, creating a comprehensive multimodal resource for action-conditioned prediction tasks. Building on this dataset, we present two complementary approaches to action-conditioned video prediction. First, we develop deterministic frameworks—ACPNet and ACVG (Action Conditioned Video Generation)— that explicitly condition predicted image frames on robot control actions, resulting in more physically consistent video predictions. We rigorously benchmark these architectures against state-ofthe- art models, demonstrating superior performance when leveraging control action data. We then advance two theoretical frameworks for learning stochastic priors that simultaneously predict future images and actions. (i) Conditional Independence: Under this assumption, we model imageaction pairs as extended system states generated from a shared latent stochastic process. We implement this approach through two models: VG-LeAP, a variational generative framework, and RAFI, built on sparsely conditioned flow matching, demonstrating the versatility of this principle across different architectural paradigms. (ii) Causal Dependence: This framework models images and actions as causally interlinked nodes, reflecting real-world scenarios where robots take actions based on current observations and then observe subsequent states as consequences. We implement this approach through Causal- LeAP, a variational generative framework that learns separate but conditionally dependent latent priors for images and actions. Our comprehensive evaluation demonstrates that explicitly incorporating control actions significantly improves prediction accuracy while maintaining computational efficiency suitable for deployment on resource constrained robotic platforms. This research bridges critical gaps between visual perception, motion planning, and control theory, establishing new foundations for intelligent autonomous systems that can effectively navigate and interact in dynamic human environments
    URI
    https://etd.iisc.ac.in/handle/2005/7112
    Collections
    • Aerospace Engineering (AE) [435]

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV
     

     

    Browse

    All of etd@IIScCommunities & CollectionsTitlesAuthorsAdvisorsSubjectsBy Thesis Submission DateThis CollectionTitlesAuthorsAdvisorsSubjectsBy Thesis Submission Date

    My Account

    LoginRegister

    etd@IISc is a joint service of SERC & J R D Tata Memorial (JRDTML) Library || Powered by DSpace software || DuraSpace
    Contact Us | Send Feedback | Thesis Templates
    Theme by 
    Atmire NV