Login / Signup
Variational Reward Estimator Bottleneck: Learning Robust Reward Estimator for Multi-Domain Task-Oriented Dialog.
Jeiyoon Park
Chanhee Lee
Kuekyeng Kim
Heuiseok Lim
Published in:
CoRR (2020)
Keyphrases
</>
reinforcement learning
multi domain
learning algorithm
least squares
learning process
active learning
maximum likelihood
prior knowledge
learning tasks
supervised learning
natural language processing
estimation error
inverse reinforcement learning
tutorial dialogue