A short variational proof of equivalence between policy gradients and soft Q learning.
Pierre H. RichemondBrendan MaginnisPublished in: CoRR (2017)
Keyphrases
- optimal policy
- action selection
- reinforcement learning
- state space
- policy iteration
- cooperative
- continuous state spaces
- image segmentation
- learning algorithm
- markov decision process
- markov decision processes
- function approximation
- optical flow
- reinforcement learning algorithms
- reward function
- multi agent
- long run
- learning rate
- decision problems
- actor critic
- finite state
- state action
- multi agent reinforcement learning
- markov decision problems
- reinforcement learning problems
- state dependent
- machine learning
- asymptotically optimal
- infinite horizon
- theorem proving
- theorem prover
- dynamic programming
- partially observable markov decision processes
- control policy
- model free
- policy search