Online Reinforcement Learning for Mixed Policy Scopes.
Junzhe ZhangElias BareinboimPublished in: NeurIPS (2022)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- action selection
- markov decision process
- function approximation
- online learning
- partially observable
- control policy
- reinforcement learning algorithms
- real time
- function approximators
- state space
- model free
- partially observable environments
- learning algorithm
- infinite horizon
- temporal difference learning
- reinforcement learning problems
- exploration exploitation tradeoff
- transfer learning
- sufficient conditions
- supervised learning
- markov decision problems
- mobile robot
- dynamic programming
- agent learns
- multi agent
- machine learning