Average reward reinforcement learning with unknown mixing times.

Tom Zahavy Alon Cohen Haim Kaplan Yishay Mansour

Published in: CoRR (2019)

Keyphrases

average reward reinforcement learning
optimal policy
artificial intelligence
data sets
neural network
artificial neural networks
image segmentation
database systems
wide range
medical images
orders of magnitude