Keyphrases
- markov decision processes
- regret bounds
- finite horizon
- optimal policy
- lower bound
- average reward
- average cost
- multi armed bandit
- infinite horizon
- online learning
- finite state
- state space
- reinforcement learning
- markov decision process
- linear regression
- dynamic programming
- upper bound
- policy iteration
- discounted reward
- long run
- bregman divergences
- online convex optimization
- loss function