Login / Signup
Algorithms for slate bandits with non-separable reward functions.
Jason Rhuggenaath
Alp Akcay
Yingqian Zhang
Uzay Kaymak
Published in:
CoRR (2020)
Keyphrases
</>
higher order
inverse reinforcement learning
clustering algorithm
reinforcement learning
maximum likelihood
sufficient conditions