Single-Timescale Actor-Critic Provably Finds Globally Optimal Policy.
Zuyue FuZhuoran YangZhaoran WangPublished in: CoRR (2020)
Keyphrases
- optimal policy
- average reward
- actor critic
- reinforcement learning
- policy iteration
- markov decision processes
- decision problems
- finite horizon
- infinite horizon
- state space
- dynamic programming
- long run
- multistage
- finite state
- markov decision process
- partially observable markov decision processes
- sufficient conditions
- optimal control
- function approximation
- reinforcement learning algorithms
- policy gradient
- temporal difference
- model free
- probability distribution
- neuro fuzzy
- linear programming
- search algorithm
- approximate dynamic programming
- data mining