Contextual-MDPs for PAC-Reinforcement Learning with Rich Observations.
Akshay KrishnamurthyAlekh AgarwalJohn LangfordPublished in: CoRR (2016)
Keyphrases
- reinforcement learning
- markov decision processes
- state space
- optimal policy
- reinforcement learning algorithms
- markov decision process
- learning algorithm
- contextual information
- continuous state and action spaces
- action sets
- state and action spaces
- function approximation
- reward function
- policy evaluation
- policy search
- policy iteration
- high level
- model free
- machine learning
- control problems
- reinforcement learning methods
- markov decision problems
- continuous state
- multi agent
- action space
- factored markov decision processes
- temporal difference
- finite state
- average cost
- sample size
- statistical queries
- supervised learning
- approximate dynamic programming
- upper bound
- planning under uncertainty
- lower bound
- search space
- learning process
- transition model
- dynamic programming
- least squares
- rl algorithms
- vc dimension
- planning problems
- optimal control
- sample complexity
- action selection