Horizon-Free Reinforcement Learning in Polynomial Time: the Power of Stationary Policies.
Zihan ZhangXiangyang JiSimon S. DuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- action sets
- markov decision processes
- stationary policies
- state space
- markov decision process
- special case
- optimal policy
- function approximation
- learning algorithm
- dynamic programming
- multi agent
- finite state
- temporal difference
- reward function
- partially observable
- model free
- function approximators
- learning tasks
- lower bound