The asymptotic equipartition property in reinforcement learning and its relation to return maximization.
Kazunori IwataKazushi IkedaHideaki SakaiPublished in: Neural Networks (2006)
Keyphrases
- reinforcement learning
- machine learning
- function approximation
- objective function
- state space
- temporal difference
- learning algorithm
- real time
- multi agent
- action space
- dynamic programming
- supervised learning
- asymptotically optimal
- model free
- desirable properties
- temporal difference learning
- reinforcement learning algorithms
- fitted q iteration
- learning problems
- markov decision processes
- dynamical systems
- transfer learning
- sufficient conditions
- worst case
- learning process
- information systems
- genetic algorithm