Adversarial Batch Inverse Reinforcement Learning: Learn to Reward from Imperfect Demonstration for Interactive Recommendation.
Jialin LiuXinyan SuZeyu HeXiangyu ZhaoJun LiPublished in: CoRR (2023)
Keyphrases
- inverse reinforcement learning
- partially observable environments
- bayesian nonparametric
- reward function
- preference elicitation
- multi agent
- learning agent
- reinforcement learning
- temporal difference
- objective function
- sufficient conditions
- optimal policy
- markov decision processes
- decision problems
- decision making
- reinforcement learning algorithms
- partially observable
- unsupervised manner