Login / Signup
Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis.
Qining Zhang
Honghao Wei
Lei Ying
Published in:
CoRR (2024)
Keyphrases
</>
model free
reinforcement learning
learning algorithm
reinforcement learning algorithms
function approximation
dynamic programming
policy iteration
temporal difference
average reward
search space
rl algorithms
state space
linear programming
monte carlo
markov decision processes
pattern recognition
bayesian networks
training data