Login / Signup
Learning to Plan Variable Length Sequences of Actions with a Cascading Bandit Click Model of User Feedback.
Anirban Santara
Gaurav Aggarwal
Shuai Li
Claudio Gentile
Published in:
AISTATS (2022)
Keyphrases
</>
variable length
user feedback
fixed length
learning algorithm
learning process
supervised learning
user interaction
n gram
user preferences
bitstream