Problem Dependent Reinforcement Learning Bounds Which Can Identify Bandit Structure in MDPs.

Andrea Zanette Emma Brunskill

Published in: CoRR (2019)

Keyphrases

reinforcement learning
markov decision processes
state space
optimal policy
supervised learning
worst case
function approximation
control problems
dynamic programming
policy search
multi armed bandit
optimal solution
search space
machine learning
fixed point
policy evaluation
model based reinforcement learning
continuous state and action spaces