Improving Exploration in Soft-Actor-Critic with Normalizing Flows Policies.
Patrick Nadeem WardAriella SmofskyAvishek Joey BosePublished in: CoRR (2019)
Keyphrases
- actor critic
- policy gradient methods
- reinforcement learning
- optimal control
- temporal difference
- partially observable markov decision processes
- policy gradient
- natural actor critic
- optimal policy
- approximate dynamic programming
- function approximation
- average reward
- cost function
- gradient method
- least squares
- neuro fuzzy
- policy iteration
- objective function
- decision making