Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning.

Published in: ACL (1) (2018)

Keyphrases