Where's the Reward?

Shayan Doroudi Vincent Aleven Emma Brunskill

Published in: Int. J. Artif. Intell. Educ. (2019)

Keyphrases

reinforcement learning
long run
data sets
three dimensional
partially observable environments
neural network
machine learning
natural language
reinforcement learning algorithms
average reward
bandit problems