Learning from an Exploring Demonstrator: Optimal Reward Estimation for Bandits.

Published in: AISTATS (2022)

Keyphrases