Time-Series Prediction for Intent Aware Robot Learning
Abstract
The study of time-series prediction and intent modeling forms a fundamental component
of intelligent and adaptive systems in robotics and human–machine interaction. Accurately
modeling temporal dependencies and anticipating user intent are essential for achieving seamless
collaboration, safety, and autonomy in systems such as collaborative robots, eXtended
Reality (XR) interfaces, and virtual pilot assistance platforms. Despite substantial advances
in modeling and control, existing approaches often struggle to represent the stochastic, nonlinear,
and context-sensitive characteristics of human behavior in dynamic environments.
This thesis addresses these challenges by developing and investigating probabilistic and
learning-based frameworks for time-series prediction across diverse interaction scenarios.
The research begins with probabilistic formulations for intent prediction during human–
robot handover. A framework based on Maximum Entropy Deep Inverse Reinforcement
Learning (MEDIRL) was developed to infer human intent from partial hand trajectories by
modeling motion as a reward-driven process. This approach was extended to a multimodal
setting that integrates hand motion and eye-gaze, thereby enhancing robustness and improving
early intent recognition. The inclusion of gaze information enabled the model to capture
attention-driven behavior and reduce ambiguity in predicting users’ intended targets during
human-robot handover.
Building on these foundations, the thesis advances to multi-agent time-series prediction
for robot-assisted manufacturing. A Feature-Based Bayesian Interaction Primitive (FBIP)
formulation was proposed to extend classical Bayesian Interaction Primitives by embedding task-relevant feature functions within the probabilistic model. This framework enabled the
prediction of coordinated motion patterns between human and robot. The probabilistic
representation captured inter-agent dependencies effectively.
To address long-horizon temporal forecasting, the thesis introduces BiPTraP (Bayesian
Interaction Primitives with Transformer for Predicting Time-Series Data)—a hybrid model
that combines Bayesian state estimation with Transformer-based self-attention. Trajectories
were represented in a basis space, updated probabilistically using ensemble Kalman filtering,
and encoded as patches for Transformer-based temporal reasoning. BiPTraP demonstrated
superior accuracy and stability in predicting non-stationary time-series data, outperforming
conventional recurrent and probabilistic baselines.
The final part of the thesis explores applications of time-series prediction in two contrasting
domains. In XR interaction, a Sampling-based Maximum Entropy IRL (SMEIRL)
framework was introduced for rapid target prediction during virtual and mixed-reality pointing
tasks, achieving high prediction accuracy with efficient sampling. In aviation, state-ofthe-
art sequence models—LSTM, Mamba, Jamba, and PatchTST—were benchmarked for
pilot control input prediction using flight simulator data. The PatchTST achieved the lowest
prediction error, highlighting its potential for developing intelligent virtual pilot assistance
systems.
Overall, the thesis presents a progression from probabilistic intent inference to scalable
Transformer-based architectures for time-series prediction. The proposed methods demonstrate
early intent recognition, multimodal fusion, and long-horizon forecasting capabilities
across human–robot, XR, and flight control applications, contributing to the broader goal of
enabling predictive, adaptive, and intent-aware autonomous systems.

