Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents.
Wenhao XuXuefeng GaoXuedong HePublished in: ICML (2023)
Keyphrases
- markov decision processes
- regret bounds
- state space
- optimal policy
- finite state
- reinforcement learning
- transition matrices
- policy iteration
- decision theoretic planning
- dynamic programming
- average cost
- average reward
- online learning
- action space
- lower bound
- upper bound
- linear regression
- infinite horizon
- reward function
- markov decision process
- action sets
- linear predictors
- learning algorithm