Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret.

Haitham Bou-Ammar Rasul Tutunov Eric Eaton

Published in: CoRR (2015)

Keyphrases

policy search
reinforcement learning
reward function
reinforcement learning algorithms
continuous state
markov decision processes
lower bound
state space
continuous action
optimal policy
partially observable
function approximation
dynamic programming
multiple agents
policy gradient
game theory
action selection
dynamical systems
function approximators
markov decision problems
generative model
learning algorithm