Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems.
Osbert BastaniPublished in: AISTATS (2020)
Keyphrases
- dynamical systems
- sample complexity
- policy gradient
- reinforcement learning methods
- partially observable markov decision processes
- theoretical analysis
- learning algorithm
- learning problems
- lower bound
- upper bound
- active learning
- special case
- supervised learning
- state space
- reinforcement learning
- generalization error
- training examples
- reinforcement learning algorithms
- sample size
- partially observable
- gradient method
- machine learning
- optimal control
- average reward
- function approximation
- model free
- training samples
- objective function