Sample Complexity of Estimating the Policy Gradient for Nearly Deterministic Dynamical Systems.

Published in: AISTATS (2020)

Keyphrases

dynamical systems
sample complexity
policy gradient
reinforcement learning methods
partially observable markov decision processes
theoretical analysis
learning algorithm
learning problems
lower bound
upper bound
active learning
special case
supervised learning
state space
reinforcement learning
generalization error
training examples
reinforcement learning algorithms
sample size
partially observable
gradient method
machine learning
optimal control
average reward
function approximation
model free
training samples
objective function