Reinforcement Learning from Human Feedback without Reward Inference: Model-Free Algorithm and Instance-Dependent Analysis.
Qining ZhangHonghao WeiLei YingPublished in: CoRR (2024)
Keyphrases
- model free
- reinforcement learning
- learning algorithm
- reinforcement learning algorithms
- function approximation
- dynamic programming
- policy iteration
- temporal difference
- average reward
- search space
- rl algorithms
- state space
- linear programming
- monte carlo
- markov decision processes
- pattern recognition
- bayesian networks
- training data