Login / Signup
Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation.
Jialin Liu
Xinyan Su
Zeyu He
Jun Li
Published in:
CSCWD (2024)
Keyphrases
</>
inverse reinforcement learning
partially observable environments
bayesian nonparametric
reward function
preference elicitation
reinforcement learning
markov decision processes
state space
temporal difference
learning agent
decision making
sufficient conditions
generative model
partially observable