Sign in
Reward-free Policy Imitation Learning for Conversational Search.
Zhenduo Wang
Zhichao Xu
Qingyao Ai
Published in:
CoRR (2023)
Keyphrases
</>
imitation learning
reinforcement learning
search space
multi modal
video sequences
real time
optimal policy
humanoid robot
markov decision processes
relational data
long run
maximum margin
reward function
average reward