Optimal Policies for Observing Time Series and Related Restless Bandit Problems.
Christopher R. DanceTomi SilanderPublished in: J. Mach. Learn. Res. (2019)
Keyphrases
- optimal policy
- decision problems
- bandit problems
- markov decision processes
- finite horizon
- multistage
- state space
- finite state
- dynamic programming
- optimal control
- reinforcement learning
- average cost
- infinite horizon
- state dependent
- average reward
- dynamic programming algorithms
- multi armed bandits
- markov decision process
- long run
- influence diagrams
- serial inventory systems
- policy iteration
- lost sales
- average reward reinforcement learning