Mean-Variance Policy Iteration for Risk-Averse Reinforcement Learning.
Shangtong ZhangBo LiuShimon WhitesonPublished in: AAAI (2021)
Keyphrases
- policy iteration
- risk averse
- reinforcement learning
- utility function
- risk aversion
- markov decision processes
- model free
- temporal difference
- optimal policy
- policy evaluation
- decision makers
- portfolio management
- function approximation
- least squares
- portfolio selection
- reinforcement learning algorithms
- markov decision process
- expected utility
- finite state
- markov decision problems
- stochastic programming
- fixed point
- average reward
- portfolio optimization
- state space
- inventory level
- optimal control
- decision problems
- dynamic programming
- machine learning
- decision theory
- linear programming
- learning algorithm
- action space
- function approximators
- probability distribution
- convergence rate
- monte carlo
- state dependent
- control policy
- reward function
- action selection
- multistage