Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.
Tengyang XieNan JiangHuan WangCaiming XiongYu BaiPublished in: CoRR (2021)
Keyphrases
- reinforcement learning
- optimal policy
- real time
- policy search
- cost effective
- online learning
- neural network
- control policy
- function approximation
- learning process
- information technology
- access control
- supervised learning
- sample size
- dynamic programming
- markov decision processes
- reinforcement learning algorithms
- asymptotically optimal
- markov decision problems
- multi agent
- partially observable environments