Bayesian Off-Policy Evaluation and Learning for Large Action Spaces.

Imad Aouali Victor-Emmanuel Brunel David Rohde Anna Korba

Published in: CoRR (2024)

Keyphrases

reinforcement learning
learning algorithm
prior knowledge
policy evaluation
supervised learning
markov decision processes
bayesian networks
search algorithm
domain independent
learning tasks
action selection
temporal difference
statistical inference