Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.
Runlong ZhouSimon S. DuBeibin LiPublished in: CoRR (2024)
Keyphrases
- fine tuning
- reinforcement learning
- reinforcement learning algorithms
- learning agents
- model free
- viable alternative
- markov decision processes
- function approximation
- fine tune
- real time
- learning classifier systems
- learning process
- online learning
- state space
- multi agent
- batch mode
- learning algorithm
- transfer learning
- action selection
- learning management systems
- domain specific
- function approximators
- control policy
- e learning
- rl algorithms
- machine learning