RLCD: Reinforcement Learning from Contrastive Distillation for LM Alignment.
Kevin YangDan KleinAsli CelikyilmazNanyun PengYuandong TianPublished in: ICLR (2024)
Keyphrases
- reinforcement learning
- function approximation
- language model
- model free
- reinforcement learning algorithms
- machine learning
- procrustes analysis
- image alignment
- action selection
- state space
- learning process
- learning algorithm
- language modeling
- multi agent
- temporal difference
- multi agent reinforcement learning
- learning tasks
- learning classifier systems
- information retrieval
- function approximators
- rna sequences
- database