Rethinking Kullback-Leibler Divergence in Knowledge Distillation for Large Language Models.
Taiqiang WuChaofan TaoJiahao WangZhe ZhaoNgai WongPublished in: CoRR (2024)
Keyphrases
- language model
- kullback leibler divergence
- language modeling
- probabilistic model
- n gram
- mutual information
- query expansion
- retrieval model
- distance measure
- information retrieval
- smoothing methods
- information theory
- probability density function
- information theoretic
- mixture model
- knowledge discovery
- prior knowledge
- expectation maximization
- data mining techniques
- text mining
- hidden markov models
- relevance model
- machine learning