Login / Signup
Research Is Its Own Reward.
Scott Davidson
Published in:
IEEE Des. Test (2017)
Keyphrases
</>
reinforcement learning
databases
long run
reward function
real time
learning algorithm
computational complexity
sufficient conditions
average reward
bandit problems
partially observable environments