Login / Signup

Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment.

Hao SunMihaela van der Schaar
Published in: CoRR (2024)
Keyphrases
  • inverse reinforcement learning
  • bayesian nonparametric
  • partially observable environments
  • preference elicitation
  • reward function
  • temporal difference
  • reinforcement learning
  • search algorithm
  • partial order