Cascaded Gaps: Towards Logarithmic Regret for Risk-Sensitive Reinforcement Learning.
Yingjie FeiRuitu XuPublished in: ICML (2022)
Keyphrases
- risk sensitive
- reinforcement learning
- model free
- optimal control
- markov decision processes
- reward function
- control policies
- worst case
- markov decision problems
- reinforcement learning algorithms
- regret bounds
- state space
- optimal policy
- function approximation
- policy iteration
- lower bound
- temporal difference
- dynamic programming
- utility function
- infinite horizon
- partially observable
- markov decision chains
- finite state
- multi agent
- markov decision process
- finite horizon
- transfer learning
- decision processes
- learning algorithm
- machine learning
- action selection
- average cost
- control policy
- linear programming
- least squares
- optimality criterion
- efficient optimization