Robust Reinforcement Learning from Corrupted Human Feedback.

Alexander Bukharin Ilgee Hong Haoming Jiang Qingru Zhang Zixuan Zhang Tuo Zhao

Published in: CoRR (2024)

Keyphrases

reinforcement learning
markov decision processes
human operators
machine learning
color images
relevance feedback
optimal policy
learning problems
function approximation
real time
genetic algorithm
artificial intelligence
state space