Login / Signup

Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback.

Zhirui ChenVincent Y. F. Tan
Published in: CoRR (2024)
Keyphrases
  • reinforcement learning
  • real time
  • dynamic programming
  • machine learning
  • worst case
  • markov decision processes
  • optimal control
  • soft constraints
  • tight bounds