Safe Policy Search for Lifelong Reinforcement Learning with Sublinear Regret.
Haitham Bou-AmmarRasul TutunovEric EatonPublished in: ICML (2015)
Keyphrases
- policy search
- reinforcement learning
- reward function
- reinforcement learning algorithms
- continuous state
- markov decision processes
- state space
- lower bound
- model free
- continuous action
- dynamic programming
- function approximation
- optimal policy
- temporal difference
- partially observable
- policy gradient
- robot navigation
- markov chain
- linear programming
- markov decision problems
- reinforcement learning methods
- average reward
- markov decision process
- random walk
- machine learning
- multiple agents
- transition probabilities
- learning problems