Reinforcement Learning from LLM Feedback to Counteract Goal Misgeneralization.

Houda Nait El Barj Théophile Sautory

Published in: CoRR (2024)

Keyphrases

reinforcement learning
function approximation
supervised learning
transfer learning
database
data sets
neural network
machine learning
information retrieval
artificial intelligence
multi agent
markov decision processes
learning classifier systems
model free
temporal difference learning
active exploration