Square-root regret bounds for continuous-time episodic Markov decision processes.
Xuefeng GaoXun Yu ZhouPublished in: CoRR (2022)
Keyphrases
- markov decision processes
- square root
- regret bounds
- state space
- reinforcement learning
- dynamic programming
- optimal policy
- optimal control
- lower bound
- arrival rate
- markov chain
- linear regression
- euclidean space
- dynamical systems
- policy iteration
- probability density function
- upper bound
- markov decision process
- image sequences
- closed form
- steady state