Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning.
Felix LeibfriedJordi Grau-MoyaPublished in: CoRL (2019)
Keyphrases
- markov decision processes
- reinforcement learning
- actor critic
- policy iteration
- reinforcement learning algorithms
- average reward
- state space
- learning algorithm
- temporal difference
- optimal policy
- learning tasks
- finite state
- partially observable
- stochastic games
- policy gradient
- real time dynamic programming
- optimal control
- learning problems
- function approximators