Regret Bounds for Reinforcement Learning with Policy Advice.
Mohammad Gheshlaghi AzarAlessandro LazaricEmma BrunskillPublished in: ECML/PKDD (1) (2013)
Keyphrases
- reinforcement learning
- optimal policy
- multi armed bandit
- regret bounds
- expert advice
- action selection
- markov decision process
- state space
- function approximators
- markov decision processes
- policy iteration
- reward function
- function approximation
- linear regression
- dynamic programming
- lower bound
- model free
- learning algorithm
- online learning
- learning process
- objective function
- similarity measure