Exponential Bellman Equation and Improved Regret Bounds for Risk-Sensitive Reinforcement Learning.
Yingjie FeiZhuoran YangYudong ChenZhaoran WangPublished in: CoRR (2021)
Keyphrases
- risk sensitive
- reinforcement learning
- optimal control
- model free
- markov decision processes
- control policies
- regret bounds
- markov decision problems
- state space
- optimal policy
- function approximation
- reinforcement learning algorithms
- optimality criterion
- utility function
- learning algorithm
- policy iteration
- machine learning
- expected utility
- reward function
- dynamic programming
- partially observable
- control strategies
- average reward
- infinite horizon
- np hard
- mathematical model