Login / Signup

Efficient Two-Phase Offline Deep Reinforcement Learning from Preference Feedback.

Yinglun XuGagandeep Singh
Published in: CoRR (2024)
Keyphrases
  • reinforcement learning
  • data sets
  • databases
  • search algorithm
  • supervised learning
  • lightweight
  • database
  • knowledge base
  • data structure
  • learning process
  • computationally efficient
  • optimal policy
  • multi attribute