Optimality of Myopic Policy for Restless Multiarmed Bandit with Imperfect Observation.
Kehao WangPublished in: GLOBECOM (2016)
Keyphrases
- multiarmed bandit
- infinite horizon
- optimal control
- average cost
- asymptotic optimality
- optimal policy
- asymptotically optimal
- selective perception
- finite horizon
- average reward
- optimal solution
- reinforcement learning
- semi markov
- real time
- long run
- policy making
- conservation laws
- markov decision processes
- multi agent
- case study
- markov decision process
- machine learning
- neural network