Indexability of Finite State Restless Multi-Armed Bandit and Rollout Policy.
Vishesh MittalRahul MeshramDeepak DevSurya PrakashPublished in: CoRR (2023)
Keyphrases
- finite state
- optimal policy
- multi armed bandit
- reinforcement learning
- partially observable markov decision processes
- markov decision processes
- policy iteration algorithm
- markov chain
- multi armed bandits
- average cost
- optimal control
- policy iteration
- dynamic programming
- markov decision process
- state space
- decision problems
- infinite horizon
- model checking
- continuous state
- long run
- reinforcement learning algorithms
- action space
- multi class
- partially observable
- markov decision problems
- stationary policies