Model-based Safe Deep Reinforcement Learning and Empirical Analysis of Safety via Attribution
Abstract
During initial iterations of training in most Reinforcement Learning (RL) algorithms, agents
perform a significant number of random exploratory steps, which in the real-world limit the
practicality of these algorithms as this can lead to potentially dangerous behavior. Hence safe
exploration is a critical issue in applying RL algorithms in the real world. This problem is well
studied in the literature under the Constrained Markov Decision Process (CMDP) Framework,
where in addition to single-stage rewards, state transitions receive single-stage costs as well.
The prescribed cost functions are responsible for mapping undesirable behavior at any given
time-step to a scalar value. Then we aim to find a feasible policy that maximizes reward returns
and keeps cost returns below a prescribed threshold during training as well as deployment.
We propose a novel On-policy Model-based Safe Deep RL algorithm in which we learn the
transition dynamics of the environment in an online manner as well as find a feasible optimal
policy using Lagrangian Relaxation-based Proximal Policy Optimization. This combination
of transition dynamics learning and a safety-promoting RL algorithm leads to 3-4 times less
environment interactions and less cumulative hazard violations compared to the model-free ap-
proach. We use an ensemble of neural networks with different initializations to tackle epistemic
and aleatoric uncertainty issues faced during environment model learning. We present our
results on a challenging Safe Reinforcement Learning benchmark - the Open AI Safety Gym.
In addition to this, we perform an attribution analysis of actions taken by the Deep Neural
Network-based policy at each time step. This analysis helps us to :
1. Identify the feature in state representation which is significantly responsible for the current
action.
2. Empirically provide the evidence of the safety-aware agent’s ability to deal with hazards in
the environment provided that hazard information is present in the state representation.
In order to perform the above analysis, we assume state representation has meaningful
information about hazards and goals. Then we calculate an attribution vector of the
same dimension as state using a well-known attribution technique known as Integrated
Gradients. The resultant attribution vector provides the importance of each state feature
for the current action.