Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds.
Jiayi HuangHan ZhongLiwei WangLin F. YangPublished in: CoRR (2023)
Keyphrases
- function approximation
- reinforcement learning
- heavy tailed
- regret bounds
- temporal difference
- model free
- dynamic programming
- function approximators
- state space
- worst case
- markov decision processes
- control policy
- optimal policy
- learning algorithm
- learning tasks
- markov decision process
- supervised learning
- optimal solution
- machine learning
- evaluation function
- transfer learning
- markov random field
- policy iteration
- learning process