Logarithmic regret bounds for continuous-time average-reward Markov decision processes.
Xuefeng GaoXun Yu ZhouPublished in: CoRR (2022)
Keyphrases
- regret bounds
- average reward
- markov decision processes
- state space
- markov chain
- optimal policy
- lower bound
- semi markov decision processes
- policy iteration
- linear regression
- finite state
- discounted reward
- optimality criterion
- stationary policies
- stochastic games
- reinforcement learning
- dynamic programming
- upper bound
- reinforcement learning algorithms
- optimal control
- total reward
- dynamical systems
- action space
- state and action spaces
- partially observable
- average cost
- hierarchical reinforcement learning
- decision theoretic planning
- factored mdps
- infinite horizon
- search space
- optimal solution
- machine learning
- graphical models
- least squares
- markov decision process
- partially observable markov decision processes
- long run
- function approximation
- state variables