Note on discounted continuous-time Markov decision processes with a lower bounding function.
Xin GuoAlexei B. PiunovskiyYi ZhangPublished in: J. Appl. Probab. (2017)
Keyphrases
- markov decision processes
- lower bounding
- state space
- optimal policy
- lower bound
- dynamic programming
- discount factor
- reinforcement learning
- multi step
- finite state
- branch and bound algorithm
- infinite horizon
- policy iteration
- decision theoretic planning
- lower and upper bounds
- dynamic time warping
- finite horizon
- transition matrices
- average reward
- action space
- optimal control
- mathematical programming
- stationary policies
- markov decision process
- average cost
- total reward
- euclidean distance
- dynamical systems
- partially observable
- markov chain
- upper bound
- learning algorithm
- discounted reward
- reinforcement learning algorithms
- reward function
- similarity search
- high dimensional