Policy Gradient Based Entropic-VaR Optimization in Risk-Sensitive Reinforcement Learning.
Xinyi NiLifeng LaiPublished in: Allerton (2022)
Keyphrases
- risk sensitive
- reinforcement learning
- model free
- control policies
- markov decision problems
- markov decision processes
- optimal policy
- optimal control
- policy iteration
- markov decision process
- state space
- action space
- function approximation
- reinforcement learning algorithms
- partially observable
- control policy
- optimization problems
- average cost
- decision processes
- reward function
- temporal difference
- infinite horizon
- machine learning
- utility function
- stochastic optimization
- function approximators
- optimization method
- finite state
- multistage
- action selection
- decision theoretic
- dynamic programming
- average reward
- control system
- optimality criterion
- learning algorithm
- decision problems
- finite horizon
- sufficient conditions
- linear programming
- multi agent