Near-Optimal Regret Bounds for Model-Free RL in Non-Stationary Episodic MDPs.
Weichao MaoKaiqing ZhangRuihao ZhuDavid Simchi-LeviTamer BasarPublished in: CoRR (2020)
Keyphrases
- non stationary
- model free
- reinforcement learning
- regret bounds
- policy iteration
- markov decision processes
- policy evaluation
- average reward
- reinforcement learning algorithms
- function approximation
- online learning
- lower bound
- rl algorithms
- linear regression
- state space
- temporal difference
- upper bound
- optimal policy
- markov decision problems
- machine learning
- empirical mode decomposition
- learning algorithm
- markov decision process
- partially observable markov decision processes
- dynamic programming
- learning process
- action space
- optimal control