Multi-Alpha Soft Actor-Critic: Overcoming Stochastic Biases in Maximum Entropy Reinforcement Learning.
Conor IgoeSwapnil PandeSiddarth VenkatramanJeff G. SchneiderPublished in: ICRA (2023)
Keyphrases
- maximum entropy
- actor critic
- reinforcement learning
- temporal difference
- policy gradient
- reinforcement learning algorithms
- approximate dynamic programming
- maximum entropy principle
- optimal control
- neuro fuzzy
- function approximation
- markov models
- monte carlo
- policy iteration
- gradient method
- average reward
- random fields
- markov decision processes
- model free
- state space
- conditional random fields
- learning algorithm
- bregman divergences
- machine learning
- linear programming
- supervised learning
- temporal difference learning
- reinforcement learning methods
- optimal policy
- markov chain
- active learning
- prior knowledge