Cascaded Gaps: Towards Gap-Dependent Regret for Risk-Sensitive Reinforcement Learning.
Yingjie FeiRuitu XuPublished in: CoRR (2022)
Keyphrases
- risk sensitive
- reinforcement learning
- optimal control
- model free
- markov decision processes
- reward function
- control policies
- markov decision problems
- reinforcement learning algorithms
- utility function
- markov decision chains
- optimal policy
- state space
- function approximation
- lower bound
- average cost
- partially observable
- dynamic programming
- control policy
- temporal difference
- learning algorithm
- policy iteration
- infinite horizon
- finite state
- finite horizon
- supervised learning
- control strategies
- long run
- control strategy
- action space
- optimality criterion
- decision theoretic
- learning tasks
- monte carlo
- mobile robot
- multi agent
- machine learning