Regularized Robust MDPs and Risk-Sensitive MDPs: Equivalence, Policy Gradient, and Sample Complexity.
Runyu ZhangYang HuNa LiPublished in: CoRR (2023)
Keyphrases
- markov decision processes
- sample complexity
- markov decision problems
- reinforcement learning
- policy gradient
- reinforcement learning algorithms
- average reward
- average cost
- state space
- learning problems
- optimal policy
- supervised learning
- finite state
- finite horizon
- dynamic programming
- policy iteration
- linear programming
- partially observable markov decision processes
- reward function
- learning algorithm
- generalization error
- infinite horizon
- special case
- action space
- active learning
- lower bound
- sample size
- markov decision process
- long run
- function approximation
- initial state
- least squares
- machine learning
- model checking