Algorithms for slate bandits with non-separable reward functions.

Jason Rhuggenaath Alp Akcay Yingqian Zhang Uzay Kaymak

Published in: CoRR (2020)

Keyphrases

higher order
inverse reinforcement learning
clustering algorithm
reinforcement learning
maximum likelihood
sufficient conditions