Bayesian Learning of Optimal Policies in Markov Decision Processes with Countably Infinite State-Space.
Saghar AdlerVijay G. SubramanianPublished in: CoRR (2023)
Keyphrases
- bayesian learning
- markov decision processes
- optimal policy
- state space
- model selection
- finite state
- finite horizon
- reinforcement learning
- dynamic programming
- decision problems
- average reward
- policy iteration
- heuristic search
- reinforcement learning algorithms
- action space
- long run
- infinite horizon
- posterior distribution
- multistage
- markov decision process
- partially observable
- markov chain
- dynamical systems
- average cost
- search space
- total reward
- reward function
- planning problems
- particle filter
- initial state
- control policies
- state variables
- state abstraction
- belief state
- hyperparameters
- markov decision problems
- policy evaluation
- discounted reward