Tiered Reinforcement Learning: Pessimism in the Face of Uncertainty and Constant Regret.

Jiawei Huang Li Zhao Tao Qin Wei Chen Nan Jiang Tie-Yan Liu

Published in: CoRR (2022)

Keyphrases

reinforcement learning
minimax regret
partial observability
reinforcement learning algorithms
online learning
function approximation
uncertain data
face images
human faces
lower bound
total reward
learning algorithm
state space
reward function
machine learning
facial expressions
markov decision processes
probability distribution
neural network
facial images
utility function
loss function
belief functions
decision theory
model free
least squares
robust optimization
dynamic programming