RS-DPO: A Hybrid Rejection Sampling and Direct Preference Optimization Method for Alignment of Large Language Models.
Saeed KhakiJinJin LiLan MaLiu YangPrathap RamachandraPublished in: CoRR (2024)
Keyphrases
- optimization method
- language model
- language modeling
- optimization algorithm
- optimization methods
- optimization process
- probabilistic model
- genetic algorithm
- simulated annealing
- n gram
- evolutionary algorithm
- language modelling
- document retrieval
- speech recognition
- retrieval model
- query expansion
- differential evolution
- information retrieval
- metaheuristic
- optimization procedure
- particle swarm
- statistical language models
- test collection
- context sensitive
- nelder mead simplex
- smoothing methods
- mixture model
- user preferences
- ad hoc information retrieval
- particle swarm optimization
- vector space model
- translation model
- optimal solution
- query terms
- document ranking
- word level
- neural network
- language models for information retrieval
- spoken term detection