On the Occupancy Measure of Non-Markovian Policies in Continuous MDPs.
Romain LarocheRemi Tachet des CombesPublished in: ICML (2023)
Keyphrases
- markov decision process
- reward function
- decision processes
- markov decision processes
- reinforcement learning
- optimal policy
- state space
- markov decision problems
- action space
- reinforcement learning agents
- average cost
- distance measure
- finite horizon
- temporally extended
- continuous state spaces
- policy search
- decision theoretic planning
- least squares
- decision problems
- control policies
- reinforcement learning algorithms
- long run
- policy iteration
- similarity measure
- continuous state
- sufficient conditions