Conservative Q-Improvement: Reinforcement Learning for an Interpretable Decision-Tree Policy.
Aaron M. RothNicholay TopinPooyan JamshidiManuela VelosoPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- decision trees
- optimal policy
- policy search
- markov decision process
- state space
- action selection
- reinforcement learning problems
- classification rules
- machine learning
- approximate dynamic programming
- partially observable environments
- reinforcement learning algorithms
- function approximation
- markov decision processes
- actor critic
- markov decision problems
- continuous state spaces
- partially observable
- control policies
- decision tree induction
- function approximators
- state and action spaces
- predictive accuracy
- policy gradient
- action space
- machine learning algorithms
- dynamic programming
- long run
- policy evaluation
- continuous state
- state action
- decision tree algorithm
- control policy
- partially observable markov decision processes
- infinite horizon
- transition model
- partially observable domains
- decision tree classifiers
- information gain
- decision problems
- sufficient conditions
- supervised learning
- learning process
- training data
- feature selection