Keyphrases
- multi armed bandit
- bandit problems
- reinforcement learning
- optimal policy
- multi armed bandit problems
- control policies
- decision problems
- model free reinforcement learning
- regret bounds
- monte carlo
- state dependent
- stochastic processes
- stochastic optimization
- partially observable markov decision processes
- policy makers
- kullback leibler
- expected cost
- kullback leibler distance
- stochastic model
- stochastic control
- markov chain
- decision making