Equivalence Between Policy Gradients and Soft Q-Learning.
John SchulmanPieter AbbeelXi ChenPublished in: CoRR (2017)
Keyphrases
- optimal policy
- action selection
- reinforcement learning
- policy iteration
- multi agent
- state space
- learning algorithm
- state action
- markov decision process
- reward function
- markov decision processes
- continuous state spaces
- function approximation
- decision problems
- control policies
- dynamic programming
- cooperative
- neural network
- temporal difference learning
- state dependent
- actor critic
- model free reinforcement learning
- gradient information
- model free
- average reward
- asymptotically optimal
- stochastic approximation
- partially observable
- policy evaluation
- infinite horizon
- hierarchical reinforcement learning
- approximate policy iteration
- agent receives
- finite state