Regret Bounds for Reinforcement Learning with Policy Advice
Mohammad Gheshlaghi AzarAlessandro LazaricEmma BrunskillPublished in: CoRR (2013)
Keyphrases
- reinforcement learning
- optimal policy
- multi armed bandit
- regret bounds
- expert advice
- markov decision process
- action selection
- function approximators
- function approximation
- markov decision processes
- state space
- policy iteration
- reward function
- model free
- lower bound
- online learning
- machine learning
- long run
- temporal difference
- maximum entropy
- linear regression
- multi class
- decision trees
- learning algorithm