Risk-Sensitive Deep RL: Variance-Constrained Actor-Critic Provably Finds Globally Optimal Policy.
Han ZhongEthan X. FangZhuoran YangZhaoran WangPublished in: CoRR (2020)
Keyphrases
- actor critic
- optimal policy
- reinforcement learning
- policy iteration
- optimal control
- markov decision processes
- markov decision problems
- infinite horizon
- average reward
- control policies
- dynamic programming
- reinforcement learning algorithms
- average cost
- decision problems
- finite horizon
- state space
- policy gradient
- finite state
- long run
- markov decision process
- multistage
- temporal difference
- rl algorithms
- model free
- function approximation
- reward function
- gradient method
- control strategy
- partially observable markov decision processes
- initial state
- machine learning
- action space
- action selection
- control policy
- sufficient conditions
- learning algorithm