Login / Signup
Policy Finetuning: Bridging Sample-Efficient Offline and Online Reinforcement Learning.
Tengyang Xie
Nan Jiang
Huan Wang
Caiming Xiong
Yu Bai
Published in:
NeurIPS (2021)
Keyphrases
</>
reinforcement learning
optimal policy
real time
computationally efficient
supervised learning
online learning
markov decision process
function approximation
action selection
balancing exploration and exploitation
state space
state dependent
control policies
partially observable environments