Tackling Heavy-Tailed Rewards in Reinforcement Learning with Function Approximation: Minimax Optimal and Instance-Dependent Regret Bounds.
Jiayi HuangHan ZhongLiwei WangLin YangPublished in: NeurIPS (2023)
Keyphrases
- function approximation
- reinforcement learning
- heavy tailed
- regret bounds
- dynamic programming
- temporal difference
- control policy
- model free
- function approximators
- markov decision processes
- worst case
- state space
- learning algorithm
- machine learning
- optimal solution
- learning process
- optimal policy
- high dimensional
- learning tasks
- radial basis function
- reward function