Approximate stochastic annealing for online control of infinite horizon Markov decision processes.
Jiaqiao HuHyeong Soo ChangPublished in: Autom. (2012)
Keyphrases
- infinite horizon
- markov decision processes
- finite horizon
- optimal control
- optimal policy
- control policies
- policy iteration
- dynamic programming
- state space
- reinforcement learning
- finite state
- policy iteration algorithm
- average cost
- policy evaluation
- long run
- production planning
- partially observable
- single item
- markov decision process
- monte carlo
- reinforcement learning algorithms
- action space
- average reward
- control system
- dec pomdps
- control strategy
- planning under uncertainty
- least squares
- discount factor
- reward function
- decision processes
- state dependent
- markov decision problems
- stationary policies
- multistage
- total reward