BiLD: Bi-directional Logits Difference Loss for Large Language Model Distillation.
Minchong LiFeng ZhouXiaohui SongPublished in: CoRR (2024)
Keyphrases
- bi directional
- language model
- language modeling
- n gram
- document retrieval
- probabilistic model
- speech recognition
- query expansion
- language modelling
- information retrieval
- retrieval model
- translation model
- test collection
- query terms
- context sensitive
- vector space model
- cross lingual
- mixture model
- statistical machine translation
- ad hoc information retrieval
- associative memory
- smoothing methods
- neural network
- document length
- statistical language models
- high dimensional
- support vector
- comparable corpora
- feature selection
- language model for information retrieval