Provable Safe Reinforcement Learning with Binary Feedback.

Andrew Bennett Dipendra Misra Nathan Kallus

Published in: AISTATS (2023)

Keyphrases

stochastic approximation
reinforcement learning
monte carlo
temporal difference learning
database
feedback loop
user feedback
function approximation
feedback mechanisms
assessment tool
action space
non binary
temporal difference
hamming distance
transfer learning
optimal policy
relevance feedback
case study
learning algorithm
machine learning
data sets