Login / Signup
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning.
Julia Kreutzer
Joshua Uyheng
Stefan Riezler
Published in:
ACL (1) (2018)
Keyphrases
</>
reinforcement learning
learning process
markov decision processes
neural network
hidden markov models
input data