Soft Actor-Critic: Off-Policy Maximum Entropy Deep Reinforcement Learning with a Stochastic Actor.
Tuomas HaarnojaAurick ZhouPieter AbbeelSergey LevinePublished in: CoRR (2018)
Keyphrases
- maximum entropy
- actor critic
- reinforcement learning
- temporal difference
- optimal control
- approximate dynamic programming
- reinforcement learning algorithms
- policy gradient
- maximum entropy principle
- function approximation
- neuro fuzzy
- gradient method
- markov models
- monte carlo
- random fields
- policy iteration
- conditional random fields
- markov decision processes
- learning algorithm
- model free
- state space
- dynamic programming
- linear program
- optimal policy
- average reward
- temporal difference learning
- active learning
- supervised learning
- step size
- reinforcement learning methods
- rl algorithms
- cost function