Reward learning from human preferences and demonstrations in Atari.

Borja Ibarz Jan Leike Tobias Pohlen Geoffrey Irving Shane Legg Dario Amodei

Published in: CoRR (2018)

Keyphrases

reinforcement learning
learning process
learning algorithm
learning tasks
supervised learning
active learning
function approximation
partially observable