Login / Signup

Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning.

Julia KreutzerJoshua UyhengStefan Riezler
Published in: ACL (1) (2018)
Keyphrases
  • reinforcement learning
  • learning process
  • markov decision processes
  • neural network
  • hidden markov models
  • input data