Mutual-Information Regularization in Markov Decision Processes and Actor-Critic Learning.
Felix LeibfriedJordi Grau-MoyaPublished in: CoRR (2019)
Keyphrases
- markov decision processes
- reinforcement learning
- actor critic
- policy iteration
- average reward
- reinforcement learning algorithms
- stochastic games
- optimal policy
- state space
- learning algorithm
- least squares
- supervised learning
- partially observable
- real time dynamic programming
- markov decision process
- reinforcement learning methods
- infinite horizon
- finite state
- function approximation
- learning tasks
- objective function
- temporal difference learning
- policy gradient
- approximate dynamic programming
- feature selection
- machine learning