Regret Bounds for Markov Decision Processes with Recursive Optimized Certainty Equivalents.
Wenhao XuXuefeng GaoXuedong HePublished in: CoRR (2023)
Keyphrases
- markov decision processes
- regret bounds
- finite state
- state space
- optimal policy
- transition matrices
- dynamic programming
- reinforcement learning
- policy iteration
- decision theoretic planning
- online learning
- average cost
- average reward
- infinite horizon
- action space
- upper bound
- reward function
- lower bound
- markov decision process
- model free
- recommender systems
- bayesian networks