Reflect-RL: Two-Player Online RL Fine-Tuning for LMs.
Runlong ZhouSimon S. DuBeibin LiPublished in: ACL (1) (2024)
Keyphrases
- fine tuning
- reinforcement learning
- reinforcement learning algorithms
- learning agents
- online learning
- model free
- optimal policy
- fine tune
- viable alternative
- function approximation
- learning process
- markov decision processes
- batch mode
- multi agent
- partially observable domains
- exploration exploitation tradeoff
- fine tuned
- learning management systems
- state space
- learning classifier systems
- temporal difference
- markov decision process
- higher education
- e learning
- autonomous learning
- machine learning
- long run
- learning problems
- monte carlo
- active learning
- real time