A Reinforcement Learning Algorithm with Polynomial Interaction Complexity for Only-Costly-Observable MDPs.
Roy FoxMoshe TennenholtzPublished in: AAAI (2007)
Keyphrases
- markov decision processes
- reinforcement learning
- polynomial hierarchy
- vapnik chervonenkis dimension
- human computer interaction
- state space
- computational complexity
- optimal policy
- error prone
- exponential size
- data sets
- computational cost
- user interaction
- dynamic programming
- high cost
- markov decision process
- search algorithm