Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping.
Dongruo ZhouJiafan HeQuanquan GuPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- markov decision processes
- feature mapping
- optimal policy
- markov decision process
- state space
- dynamic programming
- average reward
- finite horizon
- temporal difference
- average cost
- state and action spaces
- learning algorithm
- reinforcement learning algorithms
- policy iteration
- infinite horizon
- image processing