On the Sample Complexity of Learning Infinite-horizon Discounted Linear Kernel MDPs.
Yuanzhou ChenJiafan HeQuanquan GuPublished in: ICML (2022)
Keyphrases
- infinite horizon
- markov decision processes
- finite horizon
- optimal policy
- partially observable
- reinforcement learning
- dynamic programming
- markov decision process
- learning algorithm
- average cost
- optimal control
- long run
- stochastic demand
- dec pomdps
- real time
- state space
- policy iteration
- decision making
- real time dynamic programming