Soft Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity.
Runyu ZhangYang HuNa LiPublished in: ICLR (2024)
Keyphrases
- markov decision processes
- sample complexity
- markov decision problems
- reinforcement learning
- average reward
- policy gradient
- reinforcement learning algorithms
- state space
- optimal policy
- average cost
- learning problems
- dynamic programming
- finite state
- finite horizon
- lower bound
- upper bound
- partially observable
- policy iteration
- learning algorithm
- supervised learning
- optimal control
- infinite horizon
- linear programming
- special case
- partially observable markov decision processes
- active learning
- worst case
- markov decision process
- model free
- linear program
- initial state
- action space
- multi agent
- machine learning