Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space.
Saghar AdlerVijay G. SubramanianPublished in: NeurIPS (2023)
Keyphrases
- bayesian learning
- markov decision processes
- optimal policy
- state space
- model selection
- reinforcement learning
- finite state
- dynamic programming
- finite horizon
- infinite horizon
- policy iteration
- long run
- posterior distribution
- decision problems
- heuristic search
- reinforcement learning algorithms
- average reward
- dynamical systems
- markov chain
- initial state
- multistage
- markov decision process
- partially observable
- particle filter
- action space
- search space
- average cost
- semi markov decision processes
- state and action spaces
- state variables
- planning problems
- state abstraction
- reward function
- control policies
- total reward
- queueing networks
- belief state
- sufficient conditions
- probability distribution
- search algorithm