Regret Bounds and Reinforcement Learning Exploration of EXP-based Algorithms.

Mengfan Xu Diego Klabjan

Published in: CoRR (2020)

Keyphrases

reinforcement learning
learning algorithm
computational complexity
lower bound
multi armed bandit
mutual information
policy iteration