Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret.
Haitham Bou-AmmarRasul TutunovEric EatonPublished in: CoRR (2015)
Keyphrases
- policy search
- reinforcement learning
- reward function
- reinforcement learning algorithms
- continuous state
- markov decision processes
- lower bound
- state space
- continuous action
- optimal policy
- partially observable
- function approximation
- dynamic programming
- multiple agents
- policy gradient
- game theory
- action selection
- dynamical systems
- function approximators
- markov decision problems
- generative model
- learning algorithm