Login / Signup
Reliability and Learnability of Human Bandit Feedback for Sequence-to-Sequence Reinforcement Learning.
Julia Kreutzer
Joshua Uyheng
Stefan Riezler
Published in:
CoRR (2018)
Keyphrases
</>
reinforcement learning
machine learning
uniform distribution
neural network
state space
input data
markov chain
inductive logic programming
hidden state