Policy Expansion for Bridging Offline-to-Online Reinforcement Learning.
Haichao ZhangWei XuHaonan YuPublished in: ICLR (2023)
Keyphrases
- reinforcement learning
- optimal policy
- policy search
- online learning
- real time
- action selection
- markov decision process
- policy gradient
- state and action spaces
- reinforcement learning algorithms
- policy iteration
- approximate dynamic programming
- continuous state
- control policies
- asymptotically optimal
- reinforcement learning problems
- action space
- policy makers
- neural network
- website
- dynamic programming
- state space
- state action
- machine learning
- e learning
- policy evaluation
- supervised learning
- rl algorithms
- control problems
- batch mode
- control policy
- state dependent
- optimal control
- model free
- temporal difference
- reward function
- partially observable markov decision processes