Indexability and Rollout Policy for Multi-State Partially Observable Restless Bandits.
Rahul MeshramKesav KazaPublished in: CoRR (2021)
Keyphrases
- state variables
- partially observable
- state space
- reward function
- markov decision problems
- reinforcement learning
- optimal policy
- dynamical systems
- infinite horizon
- dynamic programming
- dynamic systems
- markov decision process
- partially observable environments
- optimal control
- belief state
- fully observable
- partial observability
- markov decision processes
- particle filter
- partial observations
- partially observable markov decision process
- initial state
- action models
- hidden state
- partially observable domains
- decision problems
- action space
- partially observable markov decision processes
- inverse reinforcement learning
- initially unknown
- search space
- policy iteration
- search algorithm