Episodic Reinforcement Learning in Finite MDPs: Minimax Lower Bounds Revisited.
Omar Darwiche DominguesPierre MénardEmilie KaufmannMichal ValkoPublished in: ALT (2021)
Keyphrases
- reinforcement learning
- lower bound
- state and action spaces
- markov decision processes
- worst case
- upper bound
- action space
- state space
- markov decision problems
- optimal policy
- branch and bound
- branch and bound algorithm
- objective function
- policy iteration
- reinforcement learning algorithms
- vc dimension
- function approximation
- lower and upper bounds
- partially observable
- markov decision process
- continuous state and action spaces
- finite state
- policy search
- dynamic programming
- approximate dynamic programming
- sample complexity
- model free
- np hard
- factored markov decision processes
- factored mdps
- decision theoretic planning
- continuous state spaces
- continuous state
- average reward
- supervised learning
- reward function
- machine learning
- policy evaluation
- learning problems
- finite number