Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning.
Wenjie ShiShiji SongCheng WuPublished in: CoRR (2019)
Keyphrases
- maximum entropy
- gradient method
- actor critic
- policy gradient
- reinforcement learning
- convergence rate
- maximum entropy principle
- negative matrix factorization
- step size
- optimization methods
- optimal policy
- conditional random fields
- minimum cross entropy
- convergence speed
- policy iteration
- function approximators
- natural language processing
- nearest neighbor
- optimal solution
- genetic algorithm