On the Convergence of Monte Carlo UCB for Random-Length Episodic MDPs.
Zixuan DongChe WangKeith W. RossPublished in: CoRR (2022)
Keyphrases
- monte carlo
- stochastic approximation
- monte carlo method
- policy evaluation
- markov decision processes
- markov chain
- state space
- stochastic shortest path
- monte carlo tree search
- importance sampling
- temporal difference
- particle filter
- monte carlo simulation
- reinforcement learning
- variance reduction
- matrix inversion
- policy iteration
- finite state
- dynamic programming
- optimal solution
- quasi monte carlo
- temporal difference learning
- least squares
- point processes
- learning algorithm