Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes.
Florent DelgrangeAnn NowéGuillermo A. PérezPublished in: AAAI (2022)
Keyphrases
- markov decision processes
- optimal policy
- decision theoretic planning
- markov decision process
- reinforcement learning
- macro actions
- state abstraction
- temporally extended
- state space
- initial state
- finite state
- discounted reward
- average cost
- decision processes
- reward function
- total reward
- policy iteration
- partially observable markov decision processes
- decision problems
- finite horizon
- dynamic programming
- reinforcement learning algorithms
- control policies
- decentralized control
- action space
- markov decision problems
- state and action spaces
- transition matrices
- planning under uncertainty
- infinite horizon
- average reward
- hierarchical reinforcement learning
- stationary policies
- model based reinforcement learning
- partially observable
- expected reward
- long run
- markov games
- machine learning
- policy evaluation
- factored mdps
- control policy
- stochastic games
- action sets
- least squares
- learning algorithm
- semi markov decision processes