Keyphrases
- error bounds
- reinforcement learning
- optimal policy
- theoretical analysis
- markov decision process
- policy search
- function approximation
- worst case
- reward function
- state space
- markov decision processes
- finite sample
- control policies
- fitted q iteration
- temporal difference
- complex environments
- reinforcement learning algorithms
- optimal control
- learning process
- reinforcement learning agents
- policy gradient methods
- partially observable
- partially observable markov decision processes
- dynamic environments
- markov decision problems
- multi agent reinforcement learning
- state abstraction
- learning algorithm
- hierarchical reinforcement learning
- machine learning