Order-Optimal Instance-Dependent Bounds for Offline Reinforcement Learning with Preference Feedback.

Zhirui Chen Vincent Y. F. Tan

Published in: CoRR (2024)

Keyphrases

reinforcement learning
real time
dynamic programming
machine learning
worst case
markov decision processes
optimal control
soft constraints
tight bounds