Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments.
Runlong ZhouZihan ZhangSimon Shaolei DuPublished in: ICML (2023)
Keyphrases
- reinforcement learning
- direct policy search
- stochastic optimization problems
- stochastic methods
- variance reduction
- stochastic approximation
- stage stochastic programs
- lower bound
- dynamic environments
- deterministic domains
- state space
- upper bound
- randomized algorithms
- function approximation
- model free
- fully observable
- real world
- markov decision processes
- upper and lower bounds
- monte carlo
- regret bounds
- error bounds
- standard deviation
- temporal difference learning
- partial observability
- machine learning
- high quality
- sample size
- multi agent environments
- covariance matrix
- lower and upper bounds
- control policies
- correlation coefficient
- partially observable
- stochastic model
- vc dimension