Login / Signup
Inverse-RLignment: Inverse Reinforcement Learning from Demonstrations for LLM Alignment.
Hao Sun
Mihaela van der Schaar
Published in:
CoRR (2024)
Keyphrases
</>
inverse reinforcement learning
bayesian nonparametric
partially observable environments
preference elicitation
reward function
temporal difference
reinforcement learning
search algorithm
partial order