On Optimality of Myopic Policy for Restless Multi-Armed Bandit Problem: An Axiomatic Approach.
Kehao WangLin ChenPublished in: IEEE Trans. Signal Process. (2012)
Keyphrases
- infinite horizon
- optimal control
- average cost
- average reward
- optimal policy
- total reward
- selective perception
- dynamic programming
- finite horizon
- optimal solution
- asymptotic optimality
- brute force
- markov decision processes
- neural network
- partially observable
- policy iteration
- case study
- database
- markov decision process
- multi armed bandit problems
- control policy
- information gain
- decision process
- markov chain
- information technology
- reinforcement learning
- decision trees
- feature selection
- information systems
- machine learning
- real time