Off-Policy Q-Learning for Infinite Horizon LQR Problem with Unknown Dynamics.
Xinxing LiZhihong PengLi LiangPublished in: ISIE (2018)
Keyphrases
- infinite horizon
- optimal control
- optimal policy
- state space
- policy iteration
- reinforcement learning
- dynamic programming
- finite horizon
- markov decision processes
- long run
- stochastic demand
- production planning
- dynamical systems
- partially observable
- decision problems
- function approximation
- single item
- markov decision process
- dec pomdps
- finite state
- learning algorithm
- multistage
- reinforcement learning algorithms
- control strategy
- multi agent
- average reward
- sufficient conditions
- holding cost
- search algorithm
- state dependent
- inventory level
- initial state
- action selection
- fixed cost
- lost sales
- learning rate
- closed loop