Combining importance sampling and temporal difference control variates to simulate Markov Chains.
R. S. RandhawaSandeep JunejaPublished in: ACM Trans. Model. Comput. Simul. (2004)
Keyphrases
- markov chain
- importance sampling
- monte carlo
- temporal difference
- steady state
- finite state
- transition probabilities
- reinforcement learning
- state space
- function approximation
- evaluation function
- markov processes
- policy iteration
- markov chain monte carlo
- variance reduction
- action selection
- confidence intervals
- step size
- model free
- kalman filter
- dynamic programming
- machine learning