Fast and Scalable Algorithms for Intelligent Routing of Autonomous Marine Agents
Abstract
Autonomous marine agents play a pivotal role in diverse ocean applications. These agents serve as indispensable instruments for acquiring crucial environmental information. They are used to explore and monitor harsh environments, e.g., to map ocean topography, study coral reefs, search and rescue, structural monitoring of oil and gas installations, etc. In naval security, these agents are used for the surveillance and strategic monitoring of maritime activities. Consequently, the development of robust techniques that allow these agents to navigate uncertain environments optimally is essential to reduce operational costs.
The path planning problem is as follows. An autonomous marine agent must optimally traverse from a given start location to a given target location in a stochastic dynamic velocity field, such as ocean currents, while avoiding obstacles or restricted regions in the flow. A key challenge is that the agent is heavily advected by the flow. The optimality may refer to minimizing expected travel time or energy consumption, data collection, or risk of failure. The goal of this thesis is to develop efficient, fast, and scalable algorithms for optimal planning and on-board routing for autonomous marine agents in these stochastic dynamic environments.
The thesis comprises five works organized into two parts based on the solution approach. In the first part, we model the path planning problem as a Markov Decision Process (MDP) and compute an optimal policy. However, the key challenge here is that solving an MDP can be prohibitively expensive for large state and action spaces. To overcome this challenge, we either approximate the optimal policy or accelerate the computation using GPUs.
Physics-driven Q-learning for onboard routing: We develop a physics-driven Q-learning method for approximating the optimal policy and update it in-mission for onboard routing. This involves learning an initial action-value function from a distribution of exact time-optimal paths predicted by Hamilton-Jacobi level set partial differential equations (HJLS PDEs). The flow data collected by onboard sensors is utilized to get a posterior estimate of the environment. The approximate initial optimal policy is refined in-mission by performing epsilon greedy Q-learning in simulated posterior environments.
GPU-accelerated path planning: We introduce an efficient end-to-end GPU accelerated algorithm that builds the MDP model (computing transition probabilities and expected one-step rewards) and solves the MDP to compute an exact optimal policy. We develop methodical and algorithmic solutions to overcome the limited global memory of GPUs by using a dynamic reduced-order representation of the ocean flows, leveraging the sparse nature of the state transition probability matrix, and introducing a neighboring subgrid concept to save memory and reduce the computational effort. We achieve significant speedups compared to conventional sequential computation.
Multi-objective GPU-accelerated path planning: We extend our GPU-accelerated plan- ner to a multi-objective path planner to solve multi-objective optimization problems in path planning, like minimizing both the expected mission completion time and energy consumption. MDPs are modeled with scalarized rewards for multiple objectives. The solver enables us to compute optimal operating curves in a fraction of the time compared to traditional solvers and analyze complex multi-objective missions.
In the second part, we convert the optimal path planning problem into a supervised learning problem via sequence modeling. This approach involves learning optimal action sequences based on available environment information and expert trajectories. It also avoids the issue of re-computing optimal policies for onboard routing.
Intelligent onboard routing using decision transformers: We develop a novel deep learning method based on the decision transformer (decoder-only model) for onboard routing of autonomous marine agents. Training data is obtained from aforementioned HJLS PDE or MDP solvers, which is further processed to sequences of states, actions, and returns. The model is autoregressively trained on these sequences and then tested in different environment settings. We demonstrate that (i) a trained agent learns to infer the surrounding flow and perform optimal onboard routing when the agent’s state estimation is accurate, and (ii) a trained agent is robust to limited noise in state transitions and is capable of reaching target locations in completely new flow scenarios.
Path planning with environment encoders and action decoders: We propose a path plan- ning translation transformer model (encoder-decoder). We model the problem as a sequence- to-sequence translation task, where the source sequence is the agent’s knowledge representation of the uncertain environmental flow. The target sequence is the optimal sequence of actions the agent must execute. We demonstrate that a trained transformer model can predict near-optimal paths for unseen flow realizations and obstacle configurations in a fraction of the time required by traditional planners. Validation is performed to show generalization in unseen obstacle configurations. We also analyze the predictions of both transformer models, viz, decoder only and encoder-decoder, and explain the inner mechanics of learning through a novel visualization of self-attention of actions on the predicted trajectories.