Convergence and Iteration Complexity of Policy Gradient Method for Infinite-horizon Reinforcement Learning.
Kaiqing ZhangAlec KoppelHao ZhuTamer BasarPublished in: CDC (2019)
Keyphrases
- infinite horizon
- optimal policy
- gradient method
- reinforcement learning
- convergence rate
- decision problems
- markov decision process
- actor critic
- markov decision processes
- finite horizon
- optimal control
- policy iteration
- partially observable
- long run
- state space
- dynamic programming
- stochastic demand
- policy gradient
- finite state
- average cost
- convergence speed
- step size
- average reward
- state dependent
- markov decision problems
- fixed cost
- sufficient conditions
- machine learning
- reinforcement learning algorithms
- multistage
- partially observable markov decision processes
- holding cost
- lost sales
- reward function
- model free
- negative matrix factorization
- optimization methods
- total reward
- inventory level
- control policy
- inventory control
- stationary policies
- data mining
- inventory policy