Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning.
Yingjie FeiZhuoran YangYudong ChenZhaoran WangPublished in: NeurIPS (2021)
Keyphrases
- risk sensitive
- reinforcement learning
- optimal control
- model free
- markov decision processes
- markov decision problems
- dynamic programming
- control policies
- state space
- function approximation
- utility function
- reinforcement learning algorithms
- regret bounds
- policy iteration
- learning algorithm
- partially observable
- decision theoretic
- action space
- finite state
- linear regression
- upper bound