Polynomial-Time Reinforcement Learning of Near-Optimal Policies.
Karèn PivazyanYoav ShohamPublished in: AAAI/IAAI (2002)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- markov decision process
- provably near optimal
- control policies
- state space
- hierarchical reinforcement learning
- fitted q iteration
- reward function
- function approximation
- cooperative multi agent systems
- deterministic domains
- partially observable markov decision processes
- markov decision processes
- policy gradient methods
- total reward
- dynamic programming
- special case
- reinforcement learning agents
- control policy
- model free
- reinforcement learning algorithms
- decision problems
- continuous state
- worst case
- markov decision problems
- approximate policy iteration
- machine learning
- computational complexity
- multi agent
- optimal control
- macro actions
- long run
- supervised learning
- temporal difference learning
- function approximators
- dynamical systems
- average cost
- finite state