Policy Search in Infinite-Horizon Discounted Reinforcement Learning: Advances through Connections to Non-Convex Optimization : Invited Presentation.
Kaiqing ZhangAlec KoppelHao ZhuTamer BasarPublished in: CISS (2019)
Keyphrases
- convex optimization
- policy search
- infinite horizon
- reinforcement learning
- optimal policy
- markov decision processes
- dynamic programming
- reinforcement learning algorithms
- markov decision problems
- optimal control
- partially observable markov decision processes
- continuous state
- finite horizon
- state space
- reward function
- long run
- partially observable
- finite state
- action space
- total variation
- function approximation
- average cost
- markov decision process
- policy iteration
- decision problems
- state dependent
- policy gradient
- multistage
- temporal difference
- average reward
- model free
- lead time
- initial state
- stochastic games
- learning algorithm