Soft Policy Gradient Method for Maximum Entropy Deep Reinforcement Learning.
Wenjie ShiShiji SongCheng WuPublished in: IJCAI (2019)
Keyphrases
- maximum entropy
- gradient method
- actor critic
- policy gradient
- reinforcement learning
- convergence rate
- maximum entropy principle
- optimal policy
- optimization methods
- step size
- negative matrix factorization
- minimum cross entropy
- conditional random fields
- temporal difference
- reward function
- cost function
- markov decision processes
- image processing
- reinforcement learning algorithms
- policy iteration
- generative model
- state space