Login / Signup
COPF: Continual Learning Human Preference through Optimal Policy Fitting.
Han Zhang
Lin Gui
Yuanzhao Zhai
Hui Wang
Yu Lei
Ruifeng Xu
Published in:
CoRR (2023)
Keyphrases
</>
optimal policy
reinforcement learning
dynamic programming
average reward reinforcement learning
learning algorithm
state space
decision problems
multistage
sufficient conditions
markov decision processes
infinite horizon
state dependent