Discovering diverse solutions in deep reinforcement learning by maximizing state-action-based mutual information.
Takayuki OsaVoot TangkarattMasashi SugiyamaPublished in: Neural Networks (2022)
Keyphrases
- state action
- reinforcement learning
- mutual information
- evaluation function
- action space
- continuous state
- markov decision process
- function approximation
- similarity measure
- state space
- average reward
- state transitions
- function approximators
- multi agent
- machine learning
- stochastic games
- model free
- reinforcement learning algorithms
- optimal policy
- learning process
- temporal difference
- learning algorithm
- policy gradient
- belief state
- neural network