Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.

Runlong Zhou Simon S. Du Beibin Li

Published in: CoRR (2024)

Keyphrases

fine tuning
reinforcement learning
reinforcement learning algorithms
learning agents
model free
viable alternative
markov decision processes
function approximation
fine tune
real time
learning classifier systems
learning process
online learning
state space
multi agent
batch mode
learning algorithm
transfer learning
action selection
learning management systems
domain specific
function approximators
control policy
e learning
rl algorithms
machine learning