Policy Evaluation for Reinforcement Learning from Human Feedback: A Sample Complexity Analysis.
Zihao LiXiang JiMinshuo ChenMengdi WangPublished in: AISTATS (2024)
Keyphrases
- complexity analysis
- policy evaluation
- reinforcement learning
- temporal difference
- least squares
- model free
- function approximation
- markov decision processes
- monte carlo
- policy iteration
- variance reduction
- td learning
- theoretical analysis
- sample size
- lower bound
- multi agent
- reinforcement learning algorithms
- optimal policy
- first order logic
- learning algorithm
- semi parametric
- computational complexity
- machine learning
- dynamic programming
- evaluation function
- state space
- partially observable markov decision processes
- optimal control
- step size
- average reward
- supervised learning
- transfer learning