Logarithmic Smoothing for Pessimistic Off-Policy Evaluation, Selection and Learning.

Otmane Sakhi Imad Aouali Pierre Alquier Nicolas Chopin

Published in: CoRR (2024)

Keyphrases

learning algorithm
reinforcement learning
active learning
supervised learning