Provably Good Batch Reinforcement Learning Without Great Exploration.

Yao Liu Adith Swaminathan Alekh Agarwal Emma Brunskill

Published in: CoRR (2020)

Keyphrases

reinforcement learning
active exploration
exploration strategy
action selection
model based reinforcement learning
batch mode
exploration exploitation
autonomous learning
function approximation
state space
reinforcement learning algorithms
markov decision processes
machine learning
optimal policy
balancing exploration and exploitation
batch learning
exploration exploitation tradeoff
worst case
temporal difference
batch size
learning algorithm
batch processing
temporal difference learning
model free
incremental learning
dynamic programming
objective function