Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited.
Omar Darwiche DominguesPierre MénardEmilie KaufmannMichal ValkoPublished in: CoRR (2020)
Keyphrases
- reinforcement learning
- lower bound
- state and action spaces
- markov decision processes
- worst case
- upper bound
- action space
- state space
- branch and bound
- markov decision problems
- reinforcement learning algorithms
- optimal policy
- policy iteration
- branch and bound algorithm
- function approximation
- np hard
- average reward
- markov decision process
- continuous state and action spaces
- partially observable
- policy search
- finite state
- lower and upper bounds
- vc dimension
- policy evaluation
- multi agent
- model based reinforcement learning
- machine learning
- continuous state spaces
- approximate dynamic programming
- continuous state
- objective function
- function approximators
- optimal solution
- action sets
- reward function
- temporal difference
- factored markov decision processes
- factored mdps
- action selection
- model free
- infinite horizon
- optimal control
- learning problems
- decision problems
- supervised learning