Distillation of RL Policies with Formal Guarantees via Variational Abstraction of Markov Decision Processes (Technical Report).
Florent DelgrangeAnn NowéGuillermo A. PérezPublished in: CoRR (2021)
Keyphrases
- markov decision processes
- technical report
- optimal policy
- decision theoretic planning
- markov decision process
- reinforcement learning
- macro actions
- state abstraction
- initial state
- average cost
- temporally extended
- reward function
- finite state
- discounted reward
- policy iteration
- state space
- decision processes
- decentralized control
- infinite horizon
- finite horizon
- decision problems
- reinforcement learning algorithms
- dynamic programming
- total reward
- transition matrices
- average reward
- action space
- long run
- partially observable markov decision processes
- expected reward
- control policies
- markov decision problems
- planning under uncertainty
- sufficient conditions
- stationary policies
- partially observable
- hierarchical reinforcement learning
- state and action spaces
- optical flow
- model based reinforcement learning
- semi markov decision processes
- factored mdps
- action sets
- function approximation
- markov games
- policy evaluation
- action selection
- stochastic games