Provably Efficient Reinforcement Learning for Discounted MDPs with Feature Mapping.
Dongruo ZhouJiafan HeQuanquan GuPublished in: ICML (2021)
Keyphrases
- reinforcement learning
- markov decision processes
- feature mapping
- optimal policy
- markov decision process
- dynamic programming
- state space
- reinforcement learning algorithms
- state and action spaces
- average reward
- action space
- finite horizon
- policy iteration
- average cost
- temporal difference
- manifold learning
- reward function
- data analysis
- infinite horizon
- data sets
- high dimensional data
- nearest neighbor
- pairwise
- learning algorithm
- machine learning