Maximum Entropy On-Policy Actor-Critic via Entropy Advantage Estimation.
Jean Seong Bjorn ChoeJong-Kook KimPublished in: CoRR (2024)
Keyphrases
- maximum entropy
- actor critic
- policy gradient
- reinforcement learning
- optimal control
- maximum entropy principle
- approximate dynamic programming
- temporal difference
- policy iteration
- markov models
- gradient method
- neuro fuzzy
- reinforcement learning algorithms
- policy gradient methods
- random fields
- average reward
- function approximation
- markov decision processes
- information theoretic
- conditional random fields
- convergence rate
- average cost
- dynamical systems
- supervised learning
- dynamic programming
- pairwise