Instruction Mining: High-Quality Instruction Data Selection for Large Language Models.
Yihan CaoYanbin KangLichao SunPublished in: CoRR (2023)
Keyphrases
- high quality
- language model
- information retrieval
- language modeling
- retrieval model
- document retrieval
- data analysis
- speech recognition
- knowledge discovery
- n gram
- training data
- data mining
- probabilistic model
- hidden markov models
- text documents
- multimedia
- uncertain data
- search engine
- language models for information retrieval