Adaptive importance sampling for value function approximation in off-policy reinforcement learning.
Hirotaka HachiyaTakayuki AkiyamaMasashi SugiyamaJan PetersPublished in: Neural Networks (2009)
Keyphrases
- importance sampling
- reinforcement learning
- monte carlo
- temporal difference
- state space
- markov chain
- particle filter
- kalman filter
- temporal difference learning
- rare events
- function approximation
- particle filtering
- machine learning
- approximate inference
- posterior distribution
- markov decision processes
- variance reduction
- learning algorithm
- markov chain monte carlo
- model free
- reinforcement learning algorithms
- state action
- closed form
- gaussian process
- graphical models
- semi supervised