Distribution Estimation in Discounted MDPs via a Transformation.
Shuai MaJia Yuan YuPublished in: CoRR (2018)
Keyphrases
- markov decision processes
- optimal policy
- average cost
- reinforcement learning
- infinite horizon
- dynamic programming
- finite horizon
- state space
- finite state
- decision theoretic planning
- average reward
- partially observable
- reinforcement learning algorithms
- spatial distribution
- markov decision process
- estimation accuracy
- policy iteration
- parameter estimation
- decision theoretic
- estimation algorithm
- probability density function
- data distribution
- markov decision problems