On a convergent off -policy temporal difference learning algorithm in on-line learning environment.
Prasenjit KarmakarRaj Kumar MaityShalabh BhatnagarPublished in: CoRR (2016)
Keyphrases
- temporal difference
- reinforcement learning
- learning environment
- learning algorithm
- reinforcement learning algorithms
- function approximation
- td learning
- learning process
- evaluation function
- supervised learning
- policy evaluation
- model free
- monte carlo
- step size
- temporal difference learning
- action selection
- learning tasks
- e learning
- state space
- policy iteration
- learning rate
- active learning
- function approximators
- temporal difference methods
- transfer learning
- dynamic programming
- reinforcement learning problems
- actor critic
- multiscale
- artificial neural networks
- objective function
- neural network
- basis functions
- markov decision processes