An asymptotically optimal policy for finite support models in the multiarmed bandit problem.
Junya HondaAkimichi TakemuraPublished in: Mach. Learn. (2011)
Keyphrases
- optimal policy
- multiarmed bandit
- stochastic inventory control
- multistage
- infinite horizon
- decision problems
- reinforcement learning
- asymptotically optimal
- dynamic programming
- finite horizon
- control policies
- long run
- state dependent
- markov decision processes
- finite state
- state space
- markov decision process
- finite number
- learning algorithm