Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret.
Jiawei HuangLi ZhaoTao QinWei ChenNan JiangTie-Yan LiuPublished in: CoRR (2022)
Keyphrases
- reinforcement learning
- minimax regret
- partial observability
- reinforcement learning algorithms
- online learning
- function approximation
- uncertain data
- face images
- human faces
- lower bound
- total reward
- learning algorithm
- state space
- reward function
- machine learning
- facial expressions
- markov decision processes
- probability distribution
- neural network
- facial images
- utility function
- loss function
- belief functions
- decision theory
- model free
- least squares
- robust optimization
- dynamic programming