Planning to Go Out-of-Distribution in Offline-to-Online Reinforcement Learning.
Trevor McInroeStefano V. AlbrechtAmos J. StorkeyPublished in: CoRR (2023)
Keyphrases
- reinforcement learning
- real time
- online learning
- action selection
- spatial distribution
- reinforcement learning algorithms
- dynamic programming
- machine learning
- random variables
- balancing exploration and exploitation
- reinforcement learning problems
- partial observability
- blocks world
- partially observable
- planning problems
- macro actions
- planning domains
- optimal control
- optimal policy
- decision support
- state space
- probability distribution
- neural network