Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs.
Andrea ZanetteEmma BrunskillPublished in: CoRR (2019)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- supervised learning
- worst case
- function approximation
- control problems
- dynamic programming
- policy search
- multi armed bandit
- optimal solution
- search space
- machine learning
- fixed point
- policy evaluation
- model based reinforcement learning
- continuous state and action spaces