Exponential Lower Bounds for Batch Reinforcement Learning: Batch RL can be Exponentially Harder than Online RL.
Andrea ZanettePublished in: ICML (2021)
Keyphrases
- reinforcement learning
- online algorithms
- batch mode
- lower bound
- learning algorithm
- online learning
- function approximation
- reinforcement learning algorithms
- active learning
- control policy
- model free
- rl algorithms
- optimal policy
- temporal difference
- reinforcement learning methods
- state space
- supervised learning
- markov decision processes
- continuous state
- upper bound
- dynamic programming
- direct policy search
- state and action spaces
- multi agent
- np hard
- transfer learning
- action space
- exploration exploitation tradeoff
- balancing exploration and exploitation
- partially observable domains
- np complete
- learning classifier systems
- temporal difference learning
- average case
- action selection
- worst case
- multi agent reinforcement learning
- batch learning
- multiagent reinforcement learning
- machine learning
- reward shaping
- learning problems
- average case complexity
- complex domains
- learning agent