Sharp Variance-Dependent Bounds in Reinforcement Learning: Best of Both Worlds in Stochastic and Deterministic Environments.
Runlong ZhouZihan ZhangSimon S. DuPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- direct policy search
- stochastic optimization problems
- variance reduction
- upper bound
- randomized algorithms
- stochastic approximation
- monte carlo
- stochastic methods
- lower bound
- stage stochastic programs
- function approximation
- regret bounds
- control policies
- state space
- learning algorithm
- error bounds
- optimal policy
- dynamic environments
- robotic systems
- real world
- lower and upper bounds
- model free
- temporal difference
- deterministic domains
- worst case
- multi agent environments
- continuous state spaces
- partially observable domains
- optimal control
- control policy
- markov decision process
- covariance matrix
- markov decision processes
- supervised learning
- multi agent
- objective function
- machine learning